Frequency Estimates (frequency + estimate)

Distribution by Scientific Domains


Selected Abstracts


Quantifying bias due to allele misclassification in case-control studies of haplotypes

GENETIC EPIDEMIOLOGY, Issue 7 2006
Usha S. Govindarajulu
Abstract Objectives Genotyping errors can induce biases in frequency estimates for haplotypes of single nucleotide polymorphisms (SNPs). Here, we considered the impact of SNP allele misclassification on haplotype odds ratio estimates from case-control studies of unrelated individuals. Methods We calculated bias analytically, using the haplotype counts expected in cases and controls under genotype misclassification. We evaluated the bias due to allele misclassification across a range of haplotype distributions using empirical haplotype frequencies within blocks of limited haplotype diversity. We also considered simple two- and three-locus haplotype distributions to understand the impact of haplotype frequency and number of SNPs on misclassification bias. Results We found that for common haplotypes (>5% frequency), realistic genotyping error rates (0.1,1% chance of miscalling an allele), and moderate relative risks (2,4), the bias was always towards the null and increases in magnitude with increasing error rate, increasing odds ratio. For common haplotypes, bias generally increased with increasing haplotype frequency, while for rare haplotypes, bias generally increased with decreasing frequency. When the chance of miscalling an allele is 0.5%, the median bias in haplotype-specific odds ratios for common haplotypes was generally small (<4% on the log odds ratio scale), but the bias for some individual haplotypes was larger (10,20%). Bias towards the null leads to a loss in power; the relative efficiency using a test statistic based upon misclassified haplotype data compared to a test based on the unobserved true haplotypes ranged from roughly 60% to 80%, and worsened with increasing haplotype frequency. Conclusions The cumulative effect of small allele-calling errors across multiple loci can induce noticeable bias and reduce power in realistic scenarios. This has implications for the design of candidate gene association studies that utilize multi-marker haplotypes. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


Haplotype analysis in the presence of informatively missing genotype data

GENETIC EPIDEMIOLOGY, Issue 4 2006
Nianjun Liu
Abstract It is common to have missing genotypes in practical genetic studies, but the exact underlying missing data mechanism is generally unknown to the investigators. Although some statistical methods can handle missing data, they usually assume that genotypes are missing at random, that is, at a given marker, different genotypes and different alleles are missing with the same probability. These include those methods on haplotype frequency estimation and haplotype association analysis. However, it is likely that this simple assumption does not hold in practice, yet few studies to date have examined the magnitude of the effects when this simplifying assumption is violated. In this study, we demonstrate that the violation of this assumption may lead to serious bias in haplotype frequency estimates, and haplotype association analysis based on this assumption can induce both false-positive and false-negative evidence of association. To address this limitation in the current methods, we propose a general missing data model to characterize missing data patterns across a set of two or more markers simultaneously. We prove that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under our general missing data model. Simulation studies on the analysis of haplotypes consisting of two single nucleotide polymorphisms illustrate that our proposed model can reduce the bias both for haplotype frequency estimates and association analysis due to incorrect assumption on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


Maximum-likelihood estimation of haplotype frequencies in nuclear families

GENETIC EPIDEMIOLOGY, Issue 1 2004
Tim Becker
Abstract The importance of haplotype analysis in the context of association fine mapping of disease genes has grown steadily over the last years. Since experimental methods to determine haplotypes on a large scale are not available, phase has to be inferred statistically. For individual genotype data, several reconstruction techniques and many implementations of the expectation-maximization (EM) algorithm for haplotype frequency estimation exist. Recent research work has shown that incorporating available genotype information of related individuals largely increases the precision of haplotype frequency estimates. We, therefore, implemented a highly flexible program written in C, called FAMHAP, which calculates maximum likelihood estimates (MLEs) of haplotype frequencies from general nuclear families with an arbitrary number of children via the EM-algorithm for up to 20 SNPs. For more loci, we have implemented a locus-iterative mode of the EM-algorithm, which gives reliable approximations of the MLEs for up to 63 SNP loci, or less when multi-allelic markers are incorporated into the analysis. Missing genotypes can be handled as well. The program is able to distinguish cases (haplotypes transmitted to the first affected child of a family) from pseudo-controls (non-transmitted haplotypes with respect to the child). We tested the performance of FAMHAP and the accuracy of the obtained haplotype frequencies on a variety of simulated data sets. The implementation proved to work well when many markers were considered and no significant differences between the estimates obtained with the usual EM-algorithm and those obtained in its locus-iterative mode were observed. We conclude from the simulations that the accuracy of haplotype frequency estimation and reconstruction in nuclear families is very reliable in general and robust against missing genotypes. © 2004 Wiley-Liss, Inc. [source]


Impact and Quantification of the Sources of Error in DNA Pooling Designs

ANNALS OF HUMAN GENETICS, Issue 1 2009
A. Jawaid
Summary The analysis of genome wide variation offers the possibility of unravelling the genes involved in the pathogenesis of disease. Genome wide association studies are also particularly useful for identifying and validating targets for therapeutic intervention as well as for detecting markers for drug efficacy and side effects. The cost of such large-scale genetic association studies may be reduced substantially by the analysis of pooled DNA from multiple individuals. However, experimental errors inherent in pooling studies lead to a potential increase in the false positive rate and a loss in power compared to individual genotyping. Here we quantify various sources of experimental error using empirical data from typical pooling experiments and corresponding individual genotyping counts using two statistical methods. We provide analytical formulas for calculating these different errors in the absence of complete information, such as replicate pool formation, and for adjusting for the errors in the statistical analysis. We demonstrate that DNA pooling has the potential of estimating allele frequencies accurately, and adjusting the pooled allele frequency estimates for differential allelic amplification considerably improves accuracy. Estimates of the components of error show that differential allelic amplification is the most important contributor to the error variance in absolute allele frequency estimation, followed by allele frequency measurement and pool formation errors. Our results emphasise the importance of minimising experimental errors and obtaining correct error estimates in genetic association studies. [source]