Data Mechanism (data + mechanism)

Distribution by Scientific Domains


Selected Abstracts


Haplotype analysis in the presence of informatively missing genotype data

GENETIC EPIDEMIOLOGY, Issue 4 2006
Nianjun Liu
Abstract It is common to have missing genotypes in practical genetic studies, but the exact underlying missing data mechanism is generally unknown to the investigators. Although some statistical methods can handle missing data, they usually assume that genotypes are missing at random, that is, at a given marker, different genotypes and different alleles are missing with the same probability. These include those methods on haplotype frequency estimation and haplotype association analysis. However, it is likely that this simple assumption does not hold in practice, yet few studies to date have examined the magnitude of the effects when this simplifying assumption is violated. In this study, we demonstrate that the violation of this assumption may lead to serious bias in haplotype frequency estimates, and haplotype association analysis based on this assumption can induce both false-positive and false-negative evidence of association. To address this limitation in the current methods, we propose a general missing data model to characterize missing data patterns across a set of two or more markers simultaneously. We prove that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under our general missing data model. Simulation studies on the analysis of haplotypes consisting of two single nucleotide polymorphisms illustrate that our proposed model can reduce the bias both for haplotype frequency estimates and association analysis due to incorrect assumption on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: an application to acquired immune deficiency syndrome data

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 1 2009
Stuart R. Lipsitz
Summary., In a large, prospective longitudinal study designed to monitor cardiac abnormalities in children born to women who are infected with the human immunodeficiency virus, instead of a single outcome variable, there are multiple binary outcomes (e.g. abnormal heart rate, abnormal blood pressure and abnormal heart wall thickness) considered as joint measures of heart function over time. In the presence of missing responses at some time points, longitudinal marginal models for these multiple outcomes can be estimated by using generalized estimating equations (GEEs), and consistent estimates can be obtained under the assumption of a missingness completely at random mechanism. When the missing data mechanism is missingness at random, i.e. the probability of missing a particular outcome at a time point depends on observed values of that outcome and the remaining outcomes at other time points, we propose joint estimation of the marginal models by using a single modified GEE based on an EM-type algorithm. The method proposed is motivated by the longitudinal study of cardiac abnormalities in children who were born to women infected with the human immunodeficiency virus, and analyses of these data are presented to illustrate the application of the method. Further, in an asymptotic study of bias, we show that, under a missingness at random mechanism in which missingness depends on all observed outcome variables, our joint estimation via the modified GEE produces almost unbiased estimates, provided that the correlation model has been correctly specified, whereas estimates from standard GEEs can lead to substantial bias. [source]


Using data augmentation to correct for non-ignorable non-response when surrogate data are available: an application to the distribution of hourly pay

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 3 2006
Gabriele B. Durrant
Summary., The paper develops a data augmentation method to estimate the distribution function of a variable, which is partially observed, under a non-ignorable missing data mechanism, and where surrogate data are available. An application to the estimation of hourly pay distributions using UK Labour Force Survey data provides the main motivation. In addition to considering a standard parametric data augmentation method, we consider the use of hot deck imputation methods as part of the data augmentation procedure to improve the robustness of the method. The method proposed is compared with standard methods that are based on an ignorable missing data mechanism, both in a simulation study and in the Labour Force Survey application. The focus is on reducing bias in point estimation, but variance estimation using multiple imputation is also considered briefly. [source]


A Bayesian model for longitudinal count data with non-ignorable dropout

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 5 2008
Niko A. Kaciroti
Summary., Asthma is an important chronic disease of childhood. An intervention programme for managing asthma was designed on principles of self-regulation and was evaluated by a randomized longitudinal study. The study focused on several outcomes, and, typically, missing data remained a pervasive problem. We develop a pattern,mixture model to evaluate the outcome of intervention on the number of hospitalizations with non-ignorable dropouts. Pattern,mixture models are not generally identifiable as no data may be available to estimate a number of model parameters. Sensitivity analyses are performed by imposing structures on the unidentified parameters. We propose a parameterization which permits sensitivity analyses on clustered longitudinal count data that have missing values due to non-ignorable missing data mechanisms. This parameterization is expressed as ratios between event rates across missing data patterns and the observed data pattern and thus measures departures from an ignorable missing data mechanism. Sensitivity analyses are performed within a Bayesian framework by averaging over different prior distributions on the event ratios. This model has the advantage of providing an intuitive and flexible framework for incorporating the uncertainty of the missing data mechanism in the final analysis. [source]


The Weighted Generalized Estimating Equations Approach for the Evaluation of Medical Diagnostic Test at Subunit Level

BIOMETRICAL JOURNAL, Issue 5 2006
Carol Y. Lin
Abstract Sensitivity and specificity are common measures used to evaluate the performance of a diagnostic test. A diagnostic test is often administrated at a subunit level, e.g. at the level of vessel, ear or eye of a patient so that the treatment can be targeted at the specific subunit. Therefore, it is essential to evaluate the diagnostic test at the subunit level. Often patients with more negative subunit test results are less likely to receive the gold standard tests than patients with more positive subunit test results. To account for this type of missing data and correlation between subunit test results, we proposed a weighted generalized estimating equations (WGEE) approach to evaluate subunit sensitivities and specificities. A simulation study was conducted to evaluate the performance of the WGEE estimators and the weighted least squares (WLS) estimators (Barnhart and Kosinski, 2003) under a missing at random assumption. The results suggested that WGEE estimator is consistent under various scenarios of percentage of missing data and sample size, while the WLS approach could yield biased estimators due to a misspecified missing data mechanism. We illustrate the methodology with a cardiology example. (© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Analysis of Matched Case,Control Data in Presence of Nonignorable Missing Exposure

BIOMETRICS, Issue 1 2008
Samiran Sinha
Summary. The present article deals with informative missing (IM) exposure data in matched case,control studies. When the missingness mechanism depends on the unobserved exposure values, modeling the missing data mechanism is inevitable. Therefore, a full likelihood-based approach for handling IM data has been proposed by positing a model for selection probability, and a parametric model for the partially missing exposure variable among the control population along with a disease risk model. We develop an EM algorithm to estimate the model parameters. Three special cases: (a) binary exposure variable, (b) normally distributed exposure variable, and (c) lognormally distributed exposure variable are discussed in detail. The method is illustrated by analyzing a real matched case,control data with missing exposure variable. The performance of the proposed method is evaluated through simulation studies, and the robustness of the proposed method for violation of different types of model assumptions has been considered. [source]


Bayesian Analysis for Generalized Linear Models with Nonignorably Missing Covariates

BIOMETRICS, Issue 3 2005
Lan Huang
Summary We propose Bayesian methods for estimating parameters in generalized linear models (GLMs) with nonignorably missing covariate data. We show that when improper uniform priors are used for the regression coefficients, ,, of the multinomial selection model for the missing data mechanism, the resulting joint posterior will always be improper if (i) all missing covariates are discrete and an intercept is included in the selection model for the missing data mechanism, or (ii) at least one of the covariates is continuous and unbounded. This impropriety will result regardless of whether proper or improper priors are specified for the regression parameters, ,, of the GLM or the parameters, ,, of the covariate distribution. To overcome this problem, we propose a novel class of proper priors for the regression coefficients, ,, in the selection model for the missing data mechanism. These priors are robust and computationally attractive in the sense that inferences about , are not sensitive to the choice of the hyperparameters of the prior for , and they facilitate a Gibbs sampling scheme that leads to accelerated convergence. In addition, we extend the model assessment criterion of Chen, Dey, and Ibrahim (2004a, Biometrika91, 45,63), called the weighted L measure, to GLMs and missing data problems as well as extend the deviance information criterion (DIC) of Spiegelhalter et al. (2002, Journal of the Royal Statistical Society B64, 583,639) for assessing whether the missing data mechanism is ignorable or nonignorable. A novel Markov chain Monte Carlo sampling algorithm is also developed for carrying out posterior computation. Several simulations are given to investigate the performance of the proposed Bayesian criteria as well as the sensitivity of the prior specification. Real datasets from a melanoma cancer clinical trial and a liver cancer study are presented to further illustrate the proposed methods. [source]


Likelihood Methods for Treatment Noncompliance and Subsequent Nonresponse in Randomized Trials

BIOMETRICS, Issue 2 2005
A. James O'Malley
Summary While several new methods that account for noncompliance or missing data in randomized trials have been proposed, the dual effects of noncompliance and nonresponse are rarely dealt with simultaneously. We construct a maximum likelihood estimator (MLE) of the causal effect of treatment assignment for a two-armed randomized trial assuming all-or-none treatment noncompliance and allowing for subsequent nonresponse. The EM algorithm is used for parameter estimation. Our likelihood procedure relies on a latent compliance state covariate that describes the behavior of a subject under all possible treatment assignments and characterizes the missing data mechanism as in Frangakis and Rubin (1999, Biometrika86, 365,379). Using simulated data, we show that the MLE for normal outcomes compares favorably to the method-of-moments (MOM) and the standard intention-to-treat (ITT) estimators under (1) both normal and nonnormal data, and (2) departures from the latent ignorability and compound exclusion restriction assumptions. We illustrate methods using data from a trial to compare the efficacy of two antipsychotics for adults with refractory schizophrenia. [source]


Maximum Likelihood Methods for Nonignorable Missing Responses and Covariates in Random Effects Models

BIOMETRICS, Issue 4 2003
Amy L. Stubbendick
Summary. This article analyzes quality of life (QOL) data from an Eastern Cooperative Oncology Group (ECOG) melanoma trial that compared treatment with ganglioside vaccination to treatment with high-dose interferon. The analysis of this data set is challenging due to several difficulties, namely, nonignorable missing longitudinal responses and baseline covariates. Hence, we propose a selection model for estimating parameters in the normal random effects model with nonignorable missing responses and covariates. Parameters are estimated via maximum likelihood using the Gibbs sampler and a Monte Carlo expectation maximization (EM) algorithm. Standard errors are calculated using the bootstrap. The method allows for nonmonotone patterns of missing data in both the response variable and the covariates. We model the missing data mechanism and the missing covariate distribution via a sequence of one-dimensional conditional distributions, allowing the missing covariates to be either categorical or continuous, as well as time-varying. We apply the proposed approach to the ECOG quality-of-life data and conduct a small simulation study evaluating the performance of the maximum likelihood estimates. Our results indicate that a patient treated with the vaccine has a higher QOL score on average at a given time point than a patient treated with high-dose interferon. [source]


Asymptotic bias in the linear mixed effects model under non-ignorable missing data mechanisms

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 1 2005
Chandan Saha
Summary., In longitudinal studies, missingness of data is often an unavoidable problem. Estimators from the linear mixed effects model assume that missing data are missing at random. However, estimators are biased when this assumption is not met. In the paper, theoretical results for the asymptotic bias are established under non-ignorable drop-out, drop-in and other missing data patterns. The asymptotic bias is large when the drop-out subjects have only one or no observation, especially for slope-related parameters of the linear mixed effects model. In the drop-in case, intercept-related parameter estimators show substantial asymptotic bias when subjects enter late in the study. Eight other missing data patterns are considered and these produce asymptotic biases of a variety of magnitudes. [source]


A Bayesian model for longitudinal count data with non-ignorable dropout

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 5 2008
Niko A. Kaciroti
Summary., Asthma is an important chronic disease of childhood. An intervention programme for managing asthma was designed on principles of self-regulation and was evaluated by a randomized longitudinal study. The study focused on several outcomes, and, typically, missing data remained a pervasive problem. We develop a pattern,mixture model to evaluate the outcome of intervention on the number of hospitalizations with non-ignorable dropouts. Pattern,mixture models are not generally identifiable as no data may be available to estimate a number of model parameters. Sensitivity analyses are performed by imposing structures on the unidentified parameters. We propose a parameterization which permits sensitivity analyses on clustered longitudinal count data that have missing values due to non-ignorable missing data mechanisms. This parameterization is expressed as ratios between event rates across missing data patterns and the observed data pattern and thus measures departures from an ignorable missing data mechanism. Sensitivity analyses are performed within a Bayesian framework by averaging over different prior distributions on the event ratios. This model has the advantage of providing an intuitive and flexible framework for incorporating the uncertainty of the missing data mechanism in the final analysis. [source]