Imputation

Distribution by Scientific Domains
Distribution within Mathematics and Statistics

Kinds of Imputation

  • multiple imputation

  • Terms modified by Imputation

  • imputation methods
  • imputation strategy
  • imputation techniques

  • Selected Abstracts


    FORMALIZING WIESER's THEORY OF DISTRIBUTION: CONSISTENT IMPUTATION IN ALTERNATIVE THEORETICAL PERSPECTIVES

    METROECONOMICA, Issue 2 2005
    Arrigo OpocherArticle first published online: 18 MAY 200
    ABSTRACT Wieser's theory of value and distribution has been formalized and interpreted mainly in the framework of efficient allocation of scarce resources. To this end, the mathematical techniques of linear programming have been used by such authors as Samuelson and Uzawa. This paper presents briefly what may be called the Knight,Samuelson,Uzawa formalization and supplements it with different proposed formalizations of some further aspects consistently developed in Wieser's works. The formalization that we propose concerns Wieser's theory of interest and his theory of value for ,cost goods'. It is argued that in such cases the produced means of production, and not the endowments of scarce resources, are at the centre of Wieser's analysis. It is shown that some appropriately specified models in the Sraffa,von Neumann,Leontief tradition can very usefully be employed in order to strengthen Wieser's intuitive arguments and give them a sound analytical structure. [source]


    AGENCY AND IMPUTATION: COMMENTS ON REATH

    ANALYTIC PHILOSOPHY, Issue 2 2008
    Jens Timmermann
    First page of article [source]


    GRAPHICAL SENSITIVITY ANALYSIS WITH DIFFERENT METHODS OF IMPUTATION FOR A TRIAL WITH PROBABLE NON-IGNORABLE MISSING DATA

    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 4 2009
    M. Weatherall
    Summary Graphical sensitivity analyses have recently been recommended for clinical trials with non-ignorable missing outcome. We demonstrate an adaptation of this methodology for a continuous outcome of a trial of three cognitive-behavioural therapies for mild depression in primary care, in which one arm had unexpectedly high levels of missing data. Fixed-value and multiple imputations from a normal distribution (assuming either varying mean and fixed standard deviation, or fixed mean and varying standard deviation) were used to obtain contour plots of the contrast estimates with their,P -values superimposed, their confidence intervals, and the root mean square errors. Imputation was based either on the outcome value alone, or on change from baseline. The plots showed fixed-value imputation to be more sensitive than imputing from a normal distribution, but the normally distributed imputations were subject to sampling noise. The contours of the sensitivity plots were close to linear in appearance, with the slope approximately equal to the ratio of the proportions of subjects with missing data in each trial arm. [source]


    The Validity of Using Multiple Imputation for Missing Out-of-hospital Data in a State Trauma Registry

    ACADEMIC EMERGENCY MEDICINE, Issue 3 2006
    Craig D. Newgard MD
    Objectives: To assess 1) the agreement of multiply imputed out-of-hospital values previously missing in a state trauma registry compared with known ambulance values and 2) the potential impact of using multiple imputation versus a commonly used method for handling missing data (i.e., complete case analysis) in a typical multivariable injury analysis. Methods: This was a retrospective cohort analysis. Multiply imputed out-of-hospital data from 1998 to 2003 for four variables (intubation attempt, Glasgow Coma Scale score, systolic blood pressure, and respiratory rate) were compared with known values from probabilistically linked ambulance records using measures of agreement (,, weighted ,, and Bland,Altman plots). Ambulance values were assumed to represent the "true" values for all analyses. A hypothetical multivariable regression model was used to demonstrate the impact (i.e., bias and precision of model results) of handling missing out-of-hospital data with multiple imputation versus complete case analysis. Results: A total of 6,150 matched ambulance and trauma registry records were available for comparison. Multiply imputed values for the four out-of-hospital variables demonstrated fair to good agreement with known ambulance values. When included in typical multivariable analyses, multiple imputation increased precision and reduced bias compared with using complete case analysis for the same data set. Conclusions: Multiply imputed out-of-hospital values for intubation attempt, Glasgow Coma Scale score, systolic blood pressure, and respiratory rate have fair to good agreement with known ambulance values. Multiple imputation also increased precision and reduced bias compared with complete case analysis in a typical multivariable injury model, and it should be considered for studies using out-of-hospital data from a trauma registry, particularly when substantial portions of data are missing. [source]


    Genome-wide association studies for discrete traits

    GENETIC EPIDEMIOLOGY, Issue S1 2009
    Duncan C. Thomas
    Abstract Genome-wide association studies of discrete traits generally use simple methods of analysis based on ,2 tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combination. Methods based on sliding windows, wavelets, Bayesian shrinkage, or penalized likelihood methods, among others, were explored by various participants of Genetic Analysis Workshop 16 Group 1 to combine information across multiple markers within a region, while others used Bayesian variable selection methods for genome-wide multivariate analyses of all markers simultaneously. Imputation can be used to fill in missing markers on individual subjects within a study or in a meta-analysis of studies using different panels. Although multiple imputation theoretically should give more robust tests of association, one participant contribution found little difference between results of single and multiple imputation. Careful control of population stratification is essential, and two contributions found that previously reported associations with two genes disappeared after more precise control. Other issues considered by this group included subgroup analysis, gene-gene interactions, and the use of biomarkers. Genet. Epidemiol. 33 (Suppl. 1):S8,S12, 2009. © 2009 Wiley-Liss, Inc. [source]


    Imputation of SF-12 Health Scores for Respondents with Partially Missing Data

    HEALTH SERVICES RESEARCH, Issue 3 2005
    Honghu Liu
    Objective. To create an efficient imputation algorithm for imputing the SF-12 physical component summary (PCS) and mental component summary (MCS) scores when patients have one to eleven SF-12 items missing. Study Setting. Primary data collection was performed between 1996 and 1998. Study Design. Multi-pattern regression was conducted to impute the scores using only available SF-12 items (simple model), and then supplemented by demographics, smoking status and comorbidity (enhanced model) to increase the accuracy. A cut point of missing SF-12 items was determined for using the simple or the enhanced model. The algorithm was validated through simulation. Data Collection. Thirty-thousand-three-hundred and eight patients from 63 physician groups were surveyed for a quality of care study in 1996, which collected the SF-12 and other information. The patients were classified as "chronic" patients if they reported that they had diabetes, heart disease, asthma/chronic obstructive pulmonary disease, or low back pain. A follow-up survey was conducted in 1998. Principal Findings. Thirty-one percent of the patients missed at least one SF-12 item. Means of variance of prediction and standard errors of the mean imputed scores increased with the number of missing SF-12 items. Correlations between the observed and the imputed scores derived from the enhanced models were consistently higher than those derived from the simple model and the increments were significant for patients with ,6 missing SF-12 items (p<.03). Conclusion. Missing SF-12 items are prevalent and lead to reduced analytical power. Regression-based multi-pattern imputation using the available SF-12 items is efficient and can produce good estimates of the scores. The enhancement from the additional patient information can significantly improve the accuracy of the imputed scores for patients with ,6 items missing, leading to estimated scores that are as accurate as that of patients with <6 missing items. [source]


    Using Multiple Imputation to Integrate and Disseminate Confidential Microdata

    INTERNATIONAL STATISTICAL REVIEW, Issue 2 2009
    Jerome P. Reiter
    Summary In data integration contexts, two statistical agencies seek to merge their separate databases into one file. The agencies also may seek to disseminate data to the public based on the integrated file. These goals may be complicated by the agencies' need to protect the confidentiality of database subjects, which could be at risk during the integration or dissemination stage. This article proposes several approaches based on multiple imputation for disclosure limitation, usually called synthetic data, that could be used to facilitate data integration and dissemination while protecting data confidentiality. It reviews existing methods for obtaining inferences from synthetic data and points out where new methods are needed to implement the data integration proposals. Résumé Dans les contextes d'intégration de données, deux agences statistiques cherchent à fusionner leurs bases de données séparées en un fichier. Les agences peuvent aussi chercher à diffuser au public les données issues du fichier intégré. Ces objectifs peuvent être compliqués par le besoin de protéger la confidentialité des objets de la base de données, qui pourrait être menacé pendant la phase d'intégration et de diffusion. Cet article propose plusieurs approches basées sur l'imputation multiple pour limiter la divulgation, qu'on appelle habituellement données synthétiques, qui pourraient être utilisées pour faciliter l'intégration et la diffusion des données tout en protégeant leur confidentialité. Il passe en revue les méthodes existantes pour obtenir des inférences à partir de données synthétiques et montre les cas où l'on a besoin de nouvelles méthodes pour mettre en ,uvre les propositions d'intégration de données. [source]


    Imputation of 10-year osteoporotic fracture rates from hip fractures: A clinical validation study

    JOURNAL OF BONE AND MINERAL RESEARCH, Issue 2 2010
    William D Leslie
    Abstract The World Health Organization (WHO) fracture risk assessment system (FRAX) allows for calibration from country-specific fracture data. The objective of this study was to evaluate the method for imputation of osteoporotic fracture rates from hip fractures alone. A total of 38,784 women aged 47.5 years or older at the time of baseline femoral neck bone mineral density (BMD) measurement were identified in a database containing all clinical dual energy X-ray absorptiometry (DXA) results for the Province of Manitoba, Canada. Health service records were assessed for the presence of nontrauma osteoporotic fracture codes after BMD testing (431 hip, 787 forearm, 336 clinical vertebral, and 431 humerus fractures). Ten-year hip and osteoporotic fracture rates were estimated by the Kaplan-Meier method. The population was stratified by age (50 to 90 years, 5-year width strata) and again by femoral neck T -scores (,4.0 to 0.0, 0.5 SD width strata). Within each stratum, the ratio of hip to osteoporotic fractures was calculated and compared with the predicted ratio from FRAX. Increasing age was associated with greater predicted hip-to-osteoporotic ratios (youngest 0.07 versua oldest 0.41) and observed ratios (youngest 0.10 versus oldest 0.48). Lower T -scores were associated with greater predicted (highest 0.04 versus lowest 0.71) and observed ratios (highest 0.06 versus lowest 0.44). There was a strong positive correlation between predicted and observed ratios (Spearman r,=,0.90,0.97, p,<,.001). For 14 of the 18 strata, the predicted ratio was within the observed 95% confidence interval (CI). Since collection of population-based hip fracture data is considerably easier than collection of non,hip fracture data, this study supports the current emphasis on using hip fractures as the preferred site for FRAX model calibration. © 2010 American Society for Bone and Mineral Research [source]


    Isaac Butt and the Early Development of the Marginal Utility Theory of Imputation

    AMERICAN JOURNAL OF ECONOMICS AND SOCIOLOGY, Issue 1 2010
    Article first published online: 20 JAN 2010, Laurence S. Moss
    First page of article [source]


    Hans Kelsen's Doctrine of Imputation

    RATIO JURIS, Issue 1 2001
    Stanley L. Paulson
    First, the author examines the traditional doctrine of imputation. A look at the traditional doctrine is useful for establishing a point of departure in comparing Kelsen's doctrines of central and peripheral imputation. Second, the author turns to central imputation. Here Kelsen's doctrine follows the traditional doctrine in attributing liability or responsibility to the subject. Kelsen's legal subject, however, has been depersonalized and thus requires radical qualification. Third, the author takes up peripheral imputation, which is the main focus of the paper. It is argued that with respect to the basic form of the law, exhibited by the linking of legal condition with legal consequence, peripheral imputation counts as an austere doctrine, shorn as it is of all references to legal personality or the legal subject. If Kelsen's reconstructed legal norms are empowerments, then the austere doctrine of peripheral imputation captures the rudiments of their form, exactly what would be expected if peripheral imputation does indeed serve as the category of legal cognition. Finally, the author develops the puzzle surrounding the legal "ought" in this context. Although Kelsen talks at one point as though the legal "ought" were the peculiarly legal category, the author submits that this is not the best reading of Kelsen's texts. [source]


    APOE is not Associated with Alzheimer Disease: a Cautionary tale of Genotype Imputation

    ANNALS OF HUMAN GENETICS, Issue 3 2010
    Gary W. Beecham
    Summary With the advent of publicly available genome-wide genotyping data, the use of genotype imputation methods is becoming increasingly common. These methods are of particular use in joint analyses, where data from different genotyping platforms are imputed to a reference set and combined in a single analysis. We show here that such an analysis can miss strong genetic association signals, such as that of the apolipoprotein-e gene in late-onset Alzheimer disease. This can occur in regions of weak to moderate LD; unobserved SNPs are not imputed with confidence so there is no consensus SNP set on which to perform association tests. Both IMPUTE and Mach software are tested, with similar results. Additionally, we show that a meta-analysis that properly accounts for the genotype uncertainty can recover association signals that were lost under a joint analysis. This shows that joint analyses of imputed genotypes, particularly failure to replicate strong signals, should be considered critically and examined on a case-by-case basis. [source]


    GRAPHICAL SENSITIVITY ANALYSIS WITH DIFFERENT METHODS OF IMPUTATION FOR A TRIAL WITH PROBABLE NON-IGNORABLE MISSING DATA

    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 4 2009
    M. Weatherall
    Summary Graphical sensitivity analyses have recently been recommended for clinical trials with non-ignorable missing outcome. We demonstrate an adaptation of this methodology for a continuous outcome of a trial of three cognitive-behavioural therapies for mild depression in primary care, in which one arm had unexpectedly high levels of missing data. Fixed-value and multiple imputations from a normal distribution (assuming either varying mean and fixed standard deviation, or fixed mean and varying standard deviation) were used to obtain contour plots of the contrast estimates with their,P -values superimposed, their confidence intervals, and the root mean square errors. Imputation was based either on the outcome value alone, or on change from baseline. The plots showed fixed-value imputation to be more sensitive than imputing from a normal distribution, but the normally distributed imputations were subject to sampling noise. The contours of the sensitivity plots were close to linear in appearance, with the slope approximately equal to the ratio of the proportions of subjects with missing data in each trial arm. [source]


    Extensions of the Penalized Spline of Propensity Prediction Method of Imputation

    BIOMETRICS, Issue 3 2009
    Guangyu Zhang
    SummaryLittle and An (2004,,Statistica Sinica,14, 949,968) proposed a penalized spline of propensity prediction (PSPP) method of imputation of missing values that yields robust model-based inference under the missing at random assumption. The propensity score for a missing variable is estimated and a regression model is fitted that includes the spline of the estimated logit propensity score as a covariate. The predicted unconditional mean of the missing variable has a double robustness (DR) property under misspecification of the imputation model. We show that a simplified version of PSPP, which does not center other regressors prior to including them in the prediction model, also has the DR property. We also propose two extensions of PSPP, namely, stratified PSPP and bivariate PSPP, that extend the DR property to inferences about conditional means. These extended PSPP methods are compared with the PSPP method and simple alternatives in a simulation study and applied to an online weight loss study conducted by Kaiser Permanente. [source]


    Imputation and Variable Selection in Linear Regression Models with Missing Covariates

    BIOMETRICS, Issue 2 2005
    Xiaowei Yang
    Summary Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS. [source]


    Marginal Analysis of Incomplete Longitudinal Binary Data: A Cautionary Note on LOCF Imputation

    BIOMETRICS, Issue 3 2004
    Richard J. Cook
    Summary In recent years there has been considerable research devoted to the development of methods for the analysis of incomplete data in longitudinal studies. Despite these advances, the methods used in practice have changed relatively little, particularly in the reporting of pharmaceutical trials. In this setting, perhaps the most widely adopted strategy for dealing with incomplete longitudinal data is imputation by the "last observation carried forward" (LOCF) approach, in which values for missing responses are imputed using observations from the most recently completed assessment. We examine the asymptotic and empirical bias, the empirical type I error rate, and the empirical coverage probability associated with estimators and tests of treatment effect based on the LOCF imputation strategy. We consider a setting involving longitudinal binary data with longitudinal analyses based on generalized estimating equations, and an analysis based simply on the response at the end of the scheduled follow-up. We find that for both of these approaches, imputation by LOCF can lead to substantial biases in estimators of treatment effects, the type I error rates of associated tests can be greatly inflated, and the coverage probability can be far from the nominal level. Alternative analyses based on all available data lead to estimators with comparatively small bias, and inverse probability weighted analyses yield consistent estimators subject to correct specification of the missing data process. We illustrate the differences between various methods of dealing with drop-outs using data from a study of smoking behavior. [source]


    Advanced Statistics: Missing Data in Clinical Research,Part 2: Multiple Imputation

    ACADEMIC EMERGENCY MEDICINE, Issue 7 2007
    Craig D. Newgard MD
    In part 1 of this series, the authors describe the importance of incomplete data in clinical research, and provide a conceptual framework for handling incomplete data by describing typical mechanisms and patterns of censoring, and detailing a variety of relatively simple methods and their limitations. In part 2, the authors will explore multiple imputation (MI), a more sophisticated and valid method for handling incomplete data in clinical research. This article will provide a detailed conceptual framework for MI, comparative examples of MI versus naive methods for handling incomplete data (and how different methods may impact subsequent study results), plus a practical user's guide to implementing MI, including sample statistical software MI code and a deidentified precoded database for use with the sample code. [source]


    Breastfeeding duration related to practised contraception in the Netherlands

    ACTA PAEDIATRICA, Issue 1 2009
    Jacobus P Van Wouwe
    Abstract Aim: The aim of this study was to gain insight into contraception practised and related to breastfeeding duration. Methods: Mothers with infants up to 6 months received a questionnaire on infant feeding (breast or formula feeding) and contraception (hormonal or non-hormonal methods). Estimates of the time interval between resuming contraception and cessation of lactation was calculated by Chained Equations Multiple Imputation. Results: Of all women (n = 2710), 30% choose condoms, 22% the combined oral contraceptive pill (OCP) and few other methods. Breastfeeding was started by 80%, and 18% continued up to 6 months. Of the breastfeeding mothers, 5% used hormonal contraception; 7% of women who used hormonal contraception practised breastfeeding. After adjustment for background variables, the use of OCP is strongly associated with formula feeding: after delivery to the third month postpartum, the crude OR being 17.5 (95% CI: 11.3,27.0), the adjusted OR 14.5 (9.3,22.5); between the third and sixth month postpartum, respectively, 13.1 (95% CI: 8.6,19.9) and 11.7 (7.6,17.9). Of all breastfeeding women, 20,27% resumed OCP at 25 weeks postpartum and 80% introduced formula feeding. The time lag between these events is 6 weeks. Hormonal contraception was resumed after formula introduction. Conclusion: Mothers avoid hormonal contraception during lactation; they change to formula feeding 6 weeks before they resume the OCP. To effectively promote longer duration of breastfeeding, the BFHI needs to address contraception as practised. [source]


    The Validity of Using Multiple Imputation for Missing Out-of-hospital Data in a State Trauma Registry

    ACADEMIC EMERGENCY MEDICINE, Issue 3 2006
    Craig D. Newgard MD
    Objectives: To assess 1) the agreement of multiply imputed out-of-hospital values previously missing in a state trauma registry compared with known ambulance values and 2) the potential impact of using multiple imputation versus a commonly used method for handling missing data (i.e., complete case analysis) in a typical multivariable injury analysis. Methods: This was a retrospective cohort analysis. Multiply imputed out-of-hospital data from 1998 to 2003 for four variables (intubation attempt, Glasgow Coma Scale score, systolic blood pressure, and respiratory rate) were compared with known values from probabilistically linked ambulance records using measures of agreement (,, weighted ,, and Bland,Altman plots). Ambulance values were assumed to represent the "true" values for all analyses. A hypothetical multivariable regression model was used to demonstrate the impact (i.e., bias and precision of model results) of handling missing out-of-hospital data with multiple imputation versus complete case analysis. Results: A total of 6,150 matched ambulance and trauma registry records were available for comparison. Multiply imputed values for the four out-of-hospital variables demonstrated fair to good agreement with known ambulance values. When included in typical multivariable analyses, multiple imputation increased precision and reduced bias compared with using complete case analysis for the same data set. Conclusions: Multiply imputed out-of-hospital values for intubation attempt, Glasgow Coma Scale score, systolic blood pressure, and respiratory rate have fair to good agreement with known ambulance values. Multiple imputation also increased precision and reduced bias compared with complete case analysis in a typical multivariable injury model, and it should be considered for studies using out-of-hospital data from a trauma registry, particularly when substantial portions of data are missing. [source]


    Comparison of missing value imputation methods for crop yield data

    ENVIRONMETRICS, Issue 4 2006
    Ravindra S. Lokupitiya
    Abstract Most ecological data sets contain missing values, a fact which can cause problems in the analysis and limit the utility of resulting inference. However, ecological data also tend to be spatially correlated, which can aid in estimating and imputing missing values. We compared four existing methods of estimating missing values: regression, kernel smoothing, universal kriging, and multiple imputation. Data on crop yields from the National Agricultural Statistical Survey (NASS) and the Census of Agriculture (Ag Census) were the basis for our analysis. Our goal was to find the best method to impute missing values in the NASS datasets. For this comparison, we selected the NASS data for barley crop yield in 1997 as our reference dataset. We found in this case that multiple imputation and regression were superior to methods based on spatial correlation. Universal kriging was found to be the third best method. Kernel smoothing seemed to perform very poorly. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    Imputation aware meta-analysis of genome-wide association studies

    GENETIC EPIDEMIOLOGY, Issue 6 2010
    Noah Zaitlen
    Abstract Genome-wide association studies have recently identified many new loci associated with human complex diseases. These newly discovered variants typically have weak effects requiring studies with large numbers of individuals to achieve the statistical power necessary to identify them. Likely, there exist even more associated variants, which remain to be found if even larger association studies can be assembled. Meta-analysis provides a straightforward means of increasing study sample sizes without collecting new samples by combining existing data sets. One obstacle to combining studies is that they are often performed on platforms with different marker sets. Current studies overcome this issue by imputing genotypes missing from each of the studies and then performing standard meta-analysis techniques. We show that this approach may result in a loss of power since errors in imputation are not accounted for. We present a new method for performing meta-analysis over imputed single nucleotide polymorphisms, show that it is optimal with respect to power, and discuss practical implementation issues. Through simulation experiments, we show that our imputation aware meta-analysis approach outperforms or matches standard meta-analysis approaches. Genet. Epidemiol. 34: 537,542, 2010. © 2010 Wiley-Liss, Inc. [source]


    Genome-wide association studies for discrete traits

    GENETIC EPIDEMIOLOGY, Issue S1 2009
    Duncan C. Thomas
    Abstract Genome-wide association studies of discrete traits generally use simple methods of analysis based on ,2 tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combination. Methods based on sliding windows, wavelets, Bayesian shrinkage, or penalized likelihood methods, among others, were explored by various participants of Genetic Analysis Workshop 16 Group 1 to combine information across multiple markers within a region, while others used Bayesian variable selection methods for genome-wide multivariate analyses of all markers simultaneously. Imputation can be used to fill in missing markers on individual subjects within a study or in a meta-analysis of studies using different panels. Although multiple imputation theoretically should give more robust tests of association, one participant contribution found little difference between results of single and multiple imputation. Careful control of population stratification is essential, and two contributions found that previously reported associations with two genes disappeared after more precise control. Other issues considered by this group included subgroup analysis, gene-gene interactions, and the use of biomarkers. Genet. Epidemiol. 33 (Suppl. 1):S8,S12, 2009. © 2009 Wiley-Liss, Inc. [source]


    Simple estimates of haplotype relative risks in case-control data

    GENETIC EPIDEMIOLOGY, Issue 6 2006
    Benjamin French
    Abstract Methods of varying complexity have been proposed to efficiently estimate haplotype relative risks in case-control data. Our goal was to compare methods that estimate associations between disease conditions and common haplotypes in large case-control studies such that haplotype imputation is done once as a simple data-processing step. We performed a simulation study based on haplotype frequencies for two renin-angiotensin system genes. The iterative and noniterative methods we compared involved fitting a weighted logistic regression, but differed in how the probability weights were specified. We also quantified the amount of ambiguity in the simulated genes. For one gene, there was essentially no uncertainty in the imputed diplotypes and every method performed well. For the other, ,60% of individuals had an unambiguous diplotype, and ,90% had a highest posterior probability greater than 0.75. For this gene, all methods performed well under no genetic effects, moderate effects, and strong effects tagged by a single nucleotide polymorphism (SNP). Noniterative methods produced biased estimates under strong effects not tagged by an SNP. For the most likely diplotype, median bias of the log-relative risks ranged between ,0.49 and 0.22 over all haplotypes. For all possible diplotypes, median bias ranged between ,0.73 and 0.08. Results were similar under interaction with a binary covariate. Noniterative weighted logistic regression provides valid tests for genetic associations and reliable estimates of modest effects of common haplotypes, and can be implemented in standard software. The potential for phase ambiguity does not necessarily imply uncertainty in imputed diplotypes, especially in large studies of common haplotypes. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


    Linkage analysis with sequential imputation

    GENETIC EPIDEMIOLOGY, Issue 1 2003
    Zachary Skrivanek
    Abstract Multilocus calculations, using all available information on all pedigree members, are important for linkage analysis. Exact calculation methods in linkage analysis are limited in either the number of loci or the number of pedigree members they can handle. In this article, we propose a Monte Carlo method for linkage analysis based on sequential imputation. Unlike exact methods, sequential imputation can handle large pedigrees with a moderate number of loci in its current implementation. This Monte Carlo method is an application of importance sampling, in which we sequentially impute ordered genotypes locus by locus, and then impute inheritance vectors conditioned on these genotypes. The resulting inheritance vectors, together with the importance sampling weights, are used to derive a consistent estimator of any linkage statistic of interest. The linkage statistic can be parametric or nonparametric; we focus on nonparametric linkage statistics. We demonstrate that accurate estimates can be achieved within a reasonable computing time. A simulation study illustrates the potential gain in power using our method for multilocus linkage analysis with large pedigrees. We simulated data at six markers under three models. We analyzed them using both sequential imputation and GENEHUNTER. GENEHUNTER had to drop between 38,54% of pedigree members, whereas our method was able to use all pedigree members. The power gains of using all pedigree members were substantial under 2 of the 3 models. We implemented sequential imputation for multilocus linkage analysis in a user-friendly software package called SIMPLE. Genet Epidemiol 25:25,35, 2003. © 2003 Wiley-Liss, Inc. [source]


    Enabling regional management in a changing climate through Bayesian meta-analysis of a large-scale disturbance

    GLOBAL ECOLOGY, Issue 3 2010
    M Aaron MacNeil
    ABSTRACT Aim, Quantifying and predicting change in large ecosystems is an important research objective for applied ecologists as human disturbance effects become increasingly evident at regional and global scales. However, studies used to make inferences about large-scale change are frequently of uneven quality and few in number, having been undertaken to study local, rather than global, change. Our aim is to improve the quality of inferences that can be made in meta-analyses of large-scale disturbance by integrating studies of varying quality in a unified modelling framework that is informative for both local and regional management. Innovation, Here we improve conventionally structured meta-analysis methods by including imputation of unknown study variances and the use of Bayesian factor potentials. The approach is a coherent framework for integrating data of varying quality across multiple studies while facilitating belief statements about the uncertainty in parameter estimates and the probable outcome of future events. The approach is applied to a regional meta-analysis of the effects of loss of coral cover on species richness and the abundance of coral-dependent fishes in the western Indian Ocean (WIO) before and after a mass bleaching event in 1998. Main conclusions, Our Bayesian approach to meta-analysis provided greater precision of parameter estimates than conventional weighted linear regression meta-analytical techniques, allowing us to integrate all available data from 66 available study locations in the WIO across multiple scales. The approach thereby: (1) estimated uncertainty in site-level estimates of change, (2) provided a regional estimate for future change at any given site in the WIO, and (3) provided a probabilistic belief framework for future management of reef resources at both local and regional scales. [source]


    Imputation of SF-12 Health Scores for Respondents with Partially Missing Data

    HEALTH SERVICES RESEARCH, Issue 3 2005
    Honghu Liu
    Objective. To create an efficient imputation algorithm for imputing the SF-12 physical component summary (PCS) and mental component summary (MCS) scores when patients have one to eleven SF-12 items missing. Study Setting. Primary data collection was performed between 1996 and 1998. Study Design. Multi-pattern regression was conducted to impute the scores using only available SF-12 items (simple model), and then supplemented by demographics, smoking status and comorbidity (enhanced model) to increase the accuracy. A cut point of missing SF-12 items was determined for using the simple or the enhanced model. The algorithm was validated through simulation. Data Collection. Thirty-thousand-three-hundred and eight patients from 63 physician groups were surveyed for a quality of care study in 1996, which collected the SF-12 and other information. The patients were classified as "chronic" patients if they reported that they had diabetes, heart disease, asthma/chronic obstructive pulmonary disease, or low back pain. A follow-up survey was conducted in 1998. Principal Findings. Thirty-one percent of the patients missed at least one SF-12 item. Means of variance of prediction and standard errors of the mean imputed scores increased with the number of missing SF-12 items. Correlations between the observed and the imputed scores derived from the enhanced models were consistently higher than those derived from the simple model and the increments were significant for patients with ,6 missing SF-12 items (p<.03). Conclusion. Missing SF-12 items are prevalent and lead to reduced analytical power. Regression-based multi-pattern imputation using the available SF-12 items is efficient and can produce good estimates of the scores. The enhancement from the additional patient information can significantly improve the accuracy of the imputed scores for patients with ,6 items missing, leading to estimated scores that are as accurate as that of patients with <6 missing items. [source]


    Capital gains tax and the capital asset pricing model

    ACCOUNTING & FINANCE, Issue 2 2003
    Martin Lally
    Abstract This paper develops a version of the Capital Asset Pricing Model that views dividend imputation as affecting company tax and assumes differential taxation of capital gains and ordinary income. These taxation issues aside, the model otherwise rests on the standard assumptions including full segmentation of national capital markets. It also treats dividend policy as exogenously determined. Estimates of the cost of equity based on this model are then compared with estimates based on the version of the CAPM typically applied in Australia, which differs only in assuming equality of the tax rates on capital gains and ordinary income. The differences between the estimates can be material. In particular, with a high dividend yield, allowance for differential taxation can result in an increase of two to three percentage points in the estimated cost of equity. The overall result obtained here carries over to a dividend equilibrium, in which firms choose a dividend policy that is optimal relative to the assumed tax structure. [source]


    Recalibration methods to enhance information on prevalence rates from large mental health surveys

    INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, Issue 1 2005
    N. A. Taub
    Abstract Comparisons between self-report and clinical psychiatric measures have revealed considerable disagreement. It is unsafe to consider these measures as directly equivalent, so it would be valuable to have a reliable recalibration of one measure in terms of the other. We evaluated multiple imputation incorporating a Bayesian approach, and a fully Bayesian method, to recalibrate diagnoses from a self-report survey interview in terms of those from a clinical interview with data from a two-phase national household survey for a practical application, and artificial data for simulation studies. The most important factors in obtaining a precise and accurate ,clinical' prevalence estimate from self-report data were (a) good agreement between the two diagnostic measures and (b) a sufficiently large set of calibration data with diagnoses based on both kinds of interview from the same group of subjects. From the case study, calibration data on 612 subjects were sufficient to yield estimates of the total prevalence of anxiety, depression or neurosis with a precision in the region of ±2%. The limitations of the calibration method demonstrate the need to increase agreement between survey and reference measures by improving lay interviews and their diagnostic algorithms. Copyright © 2005 Whurr Publishers Ltd. [source]


    Using Multiple Imputation to Integrate and Disseminate Confidential Microdata

    INTERNATIONAL STATISTICAL REVIEW, Issue 2 2009
    Jerome P. Reiter
    Summary In data integration contexts, two statistical agencies seek to merge their separate databases into one file. The agencies also may seek to disseminate data to the public based on the integrated file. These goals may be complicated by the agencies' need to protect the confidentiality of database subjects, which could be at risk during the integration or dissemination stage. This article proposes several approaches based on multiple imputation for disclosure limitation, usually called synthetic data, that could be used to facilitate data integration and dissemination while protecting data confidentiality. It reviews existing methods for obtaining inferences from synthetic data and points out where new methods are needed to implement the data integration proposals. Résumé Dans les contextes d'intégration de données, deux agences statistiques cherchent à fusionner leurs bases de données séparées en un fichier. Les agences peuvent aussi chercher à diffuser au public les données issues du fichier intégré. Ces objectifs peuvent être compliqués par le besoin de protéger la confidentialité des objets de la base de données, qui pourrait être menacé pendant la phase d'intégration et de diffusion. Cet article propose plusieurs approches basées sur l'imputation multiple pour limiter la divulgation, qu'on appelle habituellement données synthétiques, qui pourraient être utilisées pour faciliter l'intégration et la diffusion des données tout en protégeant leur confidentialité. Il passe en revue les méthodes existantes pour obtenir des inférences à partir de données synthétiques et montre les cas où l'on a besoin de nouvelles méthodes pour mettre en ,uvre les propositions d'intégration de données. [source]


    Imputation of 10-year osteoporotic fracture rates from hip fractures: A clinical validation study

    JOURNAL OF BONE AND MINERAL RESEARCH, Issue 2 2010
    William D Leslie
    Abstract The World Health Organization (WHO) fracture risk assessment system (FRAX) allows for calibration from country-specific fracture data. The objective of this study was to evaluate the method for imputation of osteoporotic fracture rates from hip fractures alone. A total of 38,784 women aged 47.5 years or older at the time of baseline femoral neck bone mineral density (BMD) measurement were identified in a database containing all clinical dual energy X-ray absorptiometry (DXA) results for the Province of Manitoba, Canada. Health service records were assessed for the presence of nontrauma osteoporotic fracture codes after BMD testing (431 hip, 787 forearm, 336 clinical vertebral, and 431 humerus fractures). Ten-year hip and osteoporotic fracture rates were estimated by the Kaplan-Meier method. The population was stratified by age (50 to 90 years, 5-year width strata) and again by femoral neck T -scores (,4.0 to 0.0, 0.5 SD width strata). Within each stratum, the ratio of hip to osteoporotic fractures was calculated and compared with the predicted ratio from FRAX. Increasing age was associated with greater predicted hip-to-osteoporotic ratios (youngest 0.07 versua oldest 0.41) and observed ratios (youngest 0.10 versus oldest 0.48). Lower T -scores were associated with greater predicted (highest 0.04 versus lowest 0.71) and observed ratios (highest 0.06 versus lowest 0.44). There was a strong positive correlation between predicted and observed ratios (Spearman r,=,0.90,0.97, p,<,.001). For 14 of the 18 strata, the predicted ratio was within the observed 95% confidence interval (CI). Since collection of population-based hip fracture data is considerably easier than collection of non,hip fracture data, this study supports the current emphasis on using hip fractures as the preferred site for FRAX model calibration. © 2010 American Society for Bone and Mineral Research [source]


    Multiple imputation for combining confidential data owned by two agencies

    JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 2 2009
    Christine N. Kohnen
    Summary., Statistical agencies that own different databases on overlapping subjects can benefit greatly from combining their data. These benefits are passed on to secondary data analysts when the combined data are disseminated to the public. Sometimes combining data across agencies or sharing these data with the public is not possible: one or both of these actions may break promises of confidentiality that have been given to data subjects. We describe an approach that is based on two stages of multiple imputation that facilitates data sharing and dissemination under restrictions of confidentiality. We present new inferential methods that properly account for the uncertainty that is caused by the two stages of imputation. We illustrate the approach by using artificial and genuine data. [source]