Clustered Data (clustered + data)

Distribution by Scientific Domains


Selected Abstracts


Association Models for Clustered Data with Binary and Continuous Responses

BIOMETRICS, Issue 1 2010
Lanjia Lin
Summary We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS. Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice. [source]


The Wilcoxon Signed Rank Test for Paired Comparisons of Clustered Data

BIOMETRICS, Issue 1 2006
Bernard Rosner
Summary The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with ,20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen (1971, Nonparametric Methods in Multivariate Analysis, New York: John Wiley). Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols. [source]


Advanced Statistics:Statistical Methods for Analyzing Cluster and Cluster-randomized Data

ACADEMIC EMERGENCY MEDICINE, Issue 4 2002
Robert L. Wears MD
Abstract. Sometimes interventions in randomized clinical trials are not allocated to individual patients, but rather to patients in groups. This is called cluster allocation, or cluster randomization, and is particularly common in health services research. Similarly, in some types of observational studies, patients (or observations) are found in naturally occurring groups, such as neighborhoods. In either situation, observations within a cluster tend to be more alike than observations selected entirely at random. This violates the assumption of independence that is at the heart of common methods of statistical estimation and hypothesis testing. Failure to account for the dependence between individual observations and the cluster to which they belong can have profound implications on the design and analysis of such studies. Their p-values will be too small, confidence intervals too narrow, and sample size estimates too small, sometimes to a dramatic degree. This problem is similar to that caused by the more familiar "unit of analysis error" seen when observations are repeated on the same subjects, but are treated as independent. The purpose of this paper is to provide an introduction to the problem of clustered data in clinical research. It provides guidance and examples of methods for analyzing clustered data and calculating sample sizes when planning studies. The article concludes with some general comments on statistical software for cluster data and principles for planning, analyzing, and presenting such studies. [source]


Maximum likelihood estimation of bivariate logistic models for incomplete responses with indicators of ignorable and non-ignorable missingness

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 3 2002
Nicholas J. Horton
Summary. Missing observations are a common problem that complicate the analysis of clustered data. In the Connecticut child surveys of childhood psychopathology, it was possible to identify reasons why outcomes were not observed. Of note, some of these causes of missingness may be assumed to be ignorable, whereas others may be non-ignorable. We consider logistic regression models for incomplete bivariate binary outcomes and propose mixture models that permit estimation assuming that there are two distinct types of missingness mechanisms: one that is ignorable; the other non-ignorable. A feature of the mixture modelling approach is that additional analyses to assess the sensitivity to assumptions about the missingness are relatively straightforward to incorporate. The methods were developed for analysing data from the Connecticut child surveys, where there are missing informant reports of child psychopathology and different reasons for missingness can be distinguished. [source]


S06.3: Analysis of proportions from clustered data with missing data in a matched-pair design

BIOMETRICAL JOURNAL, Issue S1 2004
Carsten Schwenke
No abstract is available for this article. [source]


Association Models for Clustered Data with Binary and Continuous Responses

BIOMETRICS, Issue 1 2010
Lanjia Lin
Summary We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS. Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice. [source]


Diagnosis of Random-Effect Model Misspecification in Generalized Linear Mixed Models for Binary Response

BIOMETRICS, Issue 2 2009
Xianzheng Huang
Summary Generalized linear mixed models (GLMMs) are widely used in the analysis of clustered data. However, the validity of likelihood-based inference in such analyses can be greatly affected by the assumed model for the random effects. We propose a diagnostic method for random-effect model misspecification in GLMMs for clustered binary response. We provide a theoretical justification of the proposed method and investigate its finite sample performance via simulation. The proposed method is applied to data from a longitudinal respiratory infection study. [source]


Nonparametric Association Analysis of Exchangeable Clustered Competing Risks Data

BIOMETRICS, Issue 2 2009
Yu Cheng
Summary The work is motivated by the Cache County Study of Aging, a population-based study in Utah, in which sibship associations in dementia onset are of interest. Complications arise because only a fraction of the population ever develops dementia, with the majority dying without dementia. The application of standard dependence analyses for independently right-censored data may not be appropriate with such multivariate competing risks data, where death may violate the independent censoring assumption. Nonparametric estimators of the bivariate cumulative hazard function and the bivariate cumulative incidence function are adapted from the simple nonexchangeable bivariate setup to exchangeable clustered data, as needed with the large sibships in the Cache County Study. Time-dependent association measures are evaluated using these estimators. Large sample inferences are studied rigorously using empirical process techniques. The practical utility of the methodology is demonstrated with realistic samples both via simulations and via an application to the Cache County Study, where dementia onset clustering among siblings varies strongly by age. [source]


Conditional Generalized Estimating Equations for the Analysis of Clustered and Longitudinal Data

BIOMETRICS, Issue 3 2008
Sylvie Goetgeluk
Summary A common and important problem in clustered sampling designs is that the effect of within-cluster exposures (i.e., exposures that vary within clusters) on outcome may be confounded by both measured and unmeasured cluster-level factors (i.e., measurements that do not vary within clusters). When some of these are ill/not accounted for, estimation of this effect through population-averaged models or random-effects models may introduce bias. We accommodate this by developing a general theory for the analysis of clustered data, which enables consistent and asymptotically normal estimation of the effects of within-cluster exposures in the presence of cluster-level confounders. Semiparametric efficient estimators are obtained by solving so-called conditional generalized estimating equations. We compare this approach with a popular proposal by Neuhaus and Kalbfleisch (1998, Biometrics54, 638,645) who separate the exposure effect into a within- and a between-cluster component within a random intercept model. We find that the latter approach yields consistent and efficient estimators when the model is linear, but is less flexible in terms of model specification. Under nonlinear models, this approach may yield inconsistent and inefficient estimators, though with little bias in most practical settings. [source]


The Wilcoxon Signed Rank Test for Paired Comparisons of Clustered Data

BIOMETRICS, Issue 1 2006
Bernard Rosner
Summary The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with ,20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen (1971, Nonparametric Methods in Multivariate Analysis, New York: John Wiley). Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols. [source]