Home About us Contact | |||
Data Augmentation (data + augmentation)
Selected AbstractsReducing the length of mental health instruments through structurally incomplete designsINTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, Issue 3 2007Niels Smits Abstract This paper presents structurally incomplete designs as an approach to reduce the length of mental health tests. In structurally incomplete test designs, respondents only fill out a subset of the total item set. The scores on the unadministered items are estimated using methods for missing data. As an illustration, structurally incomplete test designs recording, respectively, two thirds, one half, one third and one quarter of the complete item set were applied to item scores on the Centre of Epidemiological Studies-Depression (CES-D) scale of the respondents in the Longitudinal Aging Study Amsterdam (LASA). The resulting unobserved item scores were estimated with the missing data method Data Augmentation. The complete and reconstructed data yielded very similar total scores and depression classifications. In contrast, the diagnostic accuracy of the incomplete designs decreased as the designs had more unobserved item scores. The discussion addresses the strengths and limitations of the application of incomplete designs in mental health research. Copyright © 2007 John Wiley & Sons, Ltd. [source] Analysis of Capture,Recapture Models with Individual Covariates Using Data AugmentationBIOMETRICS, Issue 1 2009J. Andrew Royle Summary I consider the analysis of capture,recapture models with individual covariates that influence detection probability. Bayesian analysis of the joint likelihood is carried out using a flexible data augmentation scheme that facilitates analysis by Markov chain Monte Carlo methods, and a simple and straightforward implementation in freely available software. This approach is applied to a study of meadow voles (Microtus pennsylvanicus) in which auxiliary data on a continuous covariate (body mass) are recorded, and it is thought that detection probability is related to body mass. In a second example, the model is applied to an aerial waterfowl survey in which a double-observer protocol is used. The fundamental unit of observation is the cluster of individual birds, and the size of the cluster (a discrete covariate) is used as a covariate on detection probability. [source] Using data augmentation to correct for non-ignorable non-response when surrogate data are available: an application to the distribution of hourly payJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 3 2006Gabriele B. Durrant Summary., The paper develops a data augmentation method to estimate the distribution function of a variable, which is partially observed, under a non-ignorable missing data mechanism, and where surrogate data are available. An application to the estimation of hourly pay distributions using UK Labour Force Survey data provides the main motivation. In addition to considering a standard parametric data augmentation method, we consider the use of hot deck imputation methods as part of the data augmentation procedure to improve the robustness of the method. The method proposed is compared with standard methods that are based on an ignorable missing data mechanism, both in a simulation study and in the Labour Force Survey application. The focus is on reducing bias in point estimation, but variance estimation using multiple imputation is also considered briefly. [source] Univariate and multirater ordinal cumulative link regression with covariate specific cutpointsTHE CANADIAN JOURNAL OF STATISTICS, Issue 4 2000Hemant Ishwaran Abstract The author considers a reparameterized version of the Bayesian ordinal cumulative link regression model as a tool for exploring relationships between covariates and "cutpoint" parameters. The use of this parameterization allows one to fit models using the leapfrog hybrid Monte Carlo method, and to bypass latent variable data augmentation and the slow convergence of the cutpoints which it usually entails. The proposed Gibbs sampler is not model specific and can be easily modified to handle different link functions. The approach is illustrated by considering data from a pediatric radiology study. RÉSUMÉ L'auteur propose une nouvelle paramé'trisation du modèle de régression ordinale bayésien à lien cumu-latif dont il se sert pour explorer la relation entre des covariables et des "points de coupure." Cette reparamétrisation permet d'ajuster les modèles par une méthode de Monte-Carlo à saute-mouton modifiée, évitant ainsi le besoin d'augmentation de données de la variable latente et la lenteur de convergence des points de coupure qui en découle souvent. L'échantillonneur de Gibbs qui est proposé n'est pas spécifique au modèle et peut ,tre adapté facilement à d'autres fonctions de lien. La méthode est illustrée au moyen d'une étude de radiologie pédiatrique [source] Incorporating Genotype Uncertainty into Mark,Recapture-Type Models For Estimating Abundance Using DNA SamplesBIOMETRICS, Issue 3 2009Janine A. Wright Summary Sampling DNA noninvasively has advantages for identifying animals for uses such as mark,recapture modeling that require unique identification of animals in samples. Although it is possible to generate large amounts of data from noninvasive sources of DNA, a challenge is overcoming genotyping errors that can lead to incorrect identification of individuals. A major source of error is allelic dropout, which is failure of DNA amplification at one or more loci. This has the effect of heterozygous individuals being scored as homozygotes at those loci as only one allele is detected. If errors go undetected and the genotypes are naively used in mark,recapture models, significant overestimates of population size can occur. To avoid this it is common to reject low-quality samples but this may lead to the elimination of large amounts of data. It is preferable to retain these low-quality samples as they still contain usable information in the form of partial genotypes. Rather than trying to minimize error or discarding error-prone samples we model dropout in our analysis. We describe a method based on data augmentation that allows us to model data from samples that include uncertain genotypes. Application is illustrated using data from the European badger (Meles meles). [source] |