Frequency Estimation (frequency + estimation)

Distribution by Scientific Domains


Selected Abstracts


Haplotype analysis in the presence of informatively missing genotype data

GENETIC EPIDEMIOLOGY, Issue 4 2006
Nianjun Liu
Abstract It is common to have missing genotypes in practical genetic studies, but the exact underlying missing data mechanism is generally unknown to the investigators. Although some statistical methods can handle missing data, they usually assume that genotypes are missing at random, that is, at a given marker, different genotypes and different alleles are missing with the same probability. These include those methods on haplotype frequency estimation and haplotype association analysis. However, it is likely that this simple assumption does not hold in practice, yet few studies to date have examined the magnitude of the effects when this simplifying assumption is violated. In this study, we demonstrate that the violation of this assumption may lead to serious bias in haplotype frequency estimates, and haplotype association analysis based on this assumption can induce both false-positive and false-negative evidence of association. To address this limitation in the current methods, we propose a general missing data model to characterize missing data patterns across a set of two or more markers simultaneously. We prove that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under our general missing data model. Simulation studies on the analysis of haplotypes consisting of two single nucleotide polymorphisms illustrate that our proposed model can reduce the bias both for haplotype frequency estimates and association analysis due to incorrect assumption on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


Haplotype association analysis for late onset diseases using nuclear family data

GENETIC EPIDEMIOLOGY, Issue 3 2006
Chun Li
Abstract In haplotype-based association studies for late onset diseases, one attractive design is to use available unaffected spouses as controls (Valle et al. [1998] Diab. Care 21:949,958). Given cases and spouses only, the standard expectation-maximization (EM) algorithm (Dempster et al. [1977] J. R. Stat. Soc. B 39:1,38) for case-control data can be used to estimate haplotype frequencies. But often we will have offspring for at least some of the spouse pairs, and offspring genotypes provide additional information about the haplotypes of the parents. Existing methods may either ignore the offspring information, or reconstruct haplotypes for the subjects using offspring information and discard data from those whose haplotypes cannot be reconstructed with high confidence. Neither of these approaches is efficient, and the latter approach may also be biased. For case-control data with some subjects forming spouse pairs and offspring genotypes available for some spouse pairs or individuals, we propose a unified, likelihood-based method of haplotype inference. The method makes use of available offspring genotype information to apportion ambiguous haplotypes for the subjects. For subjects without offspring genotype information, haplotypes are apportioned as in the standard EM algorithm for case-control data. Our method enables efficient haplotype frequency estimation using an EM algorithm and supports probabilistic haplotype reconstruction with the probability calculated based on the whole sample. We describe likelihood ratio and permutation tests to test for disease-haplotype association, and describe three test statistics that are potentially useful for detecting such an association. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


Maximum-likelihood estimation of haplotype frequencies in nuclear families

GENETIC EPIDEMIOLOGY, Issue 1 2004
Tim Becker
Abstract The importance of haplotype analysis in the context of association fine mapping of disease genes has grown steadily over the last years. Since experimental methods to determine haplotypes on a large scale are not available, phase has to be inferred statistically. For individual genotype data, several reconstruction techniques and many implementations of the expectation-maximization (EM) algorithm for haplotype frequency estimation exist. Recent research work has shown that incorporating available genotype information of related individuals largely increases the precision of haplotype frequency estimates. We, therefore, implemented a highly flexible program written in C, called FAMHAP, which calculates maximum likelihood estimates (MLEs) of haplotype frequencies from general nuclear families with an arbitrary number of children via the EM-algorithm for up to 20 SNPs. For more loci, we have implemented a locus-iterative mode of the EM-algorithm, which gives reliable approximations of the MLEs for up to 63 SNP loci, or less when multi-allelic markers are incorporated into the analysis. Missing genotypes can be handled as well. The program is able to distinguish cases (haplotypes transmitted to the first affected child of a family) from pseudo-controls (non-transmitted haplotypes with respect to the child). We tested the performance of FAMHAP and the accuracy of the obtained haplotype frequencies on a variety of simulated data sets. The implementation proved to work well when many markers were considered and no significant differences between the estimates obtained with the usual EM-algorithm and those obtained in its locus-iterative mode were observed. We conclude from the simulations that the accuracy of haplotype frequency estimation and reconstruction in nuclear families is very reliable in general and robust against missing genotypes. © 2004 Wiley-Liss, Inc. [source]


OPTIMAL AND ADAPTIVE SEMI-PARAMETRIC NARROWBAND AND BROADBAND AND MAXIMUM LIKELIHOOD ESTIMATION OF THE LONG-MEMORY PARAMETER FOR REAL EXCHANGE RATES,

THE MANCHESTER SCHOOL, Issue 2 2005
SAEED HERAVI
The nature of the time series properties of real exchange rates remains a contentious issue primarily because of the implications for purchasing power parity. In particular are real exchange rates best characterized as stationary and non-persistent; nonstationary but non-persistent; or nonstationary and persistent? Most assessments of this issue use the I(0)/I(1) paradigm, which only allows the first and last of these options. In contrast, in the I(d) paradigm, d fractional, all three are possible, with the crucial parameter d determining the long-run properties of the process. This study includes estimation of d by three methods of semi-parametric estimation in the frequency domain, using both local and global (Fourier) frequency estimation, and maximum likelihood estimation of ARFIMA models in the time domain. We give a transparent assessment of the key selection parameters in each method, particularly estimation of the truncation parameters for the semi-parametric methods. Two other important developments are also included. We implement Tanaka's locally best invariant parametric tests based on maximum likelihood estimation of the long-memory parameter and include a recent extension of the Dickey,Fuller approach, referred to as fractional Dickey,Fuller (FD-F), to fractionally integrated series, which allows a much wider range of generating processes under the alternative hypothesis. With this more general approach, we find very little evidence of stationarity for 10 real exchange rates for developed countries and some very limited evidence of nonstationarity but non-persistence, and none of the FD-F tests leads to rejection of the null of a unit root. [source]


Impact and Quantification of the Sources of Error in DNA Pooling Designs

ANNALS OF HUMAN GENETICS, Issue 1 2009
A. Jawaid
Summary The analysis of genome wide variation offers the possibility of unravelling the genes involved in the pathogenesis of disease. Genome wide association studies are also particularly useful for identifying and validating targets for therapeutic intervention as well as for detecting markers for drug efficacy and side effects. The cost of such large-scale genetic association studies may be reduced substantially by the analysis of pooled DNA from multiple individuals. However, experimental errors inherent in pooling studies lead to a potential increase in the false positive rate and a loss in power compared to individual genotyping. Here we quantify various sources of experimental error using empirical data from typical pooling experiments and corresponding individual genotyping counts using two statistical methods. We provide analytical formulas for calculating these different errors in the absence of complete information, such as replicate pool formation, and for adjusting for the errors in the statistical analysis. We demonstrate that DNA pooling has the potential of estimating allele frequencies accurately, and adjusting the pooled allele frequency estimates for differential allelic amplification considerably improves accuracy. Estimates of the components of error show that differential allelic amplification is the most important contributor to the error variance in absolute allele frequency estimation, followed by allele frequency measurement and pool formation errors. Our results emphasise the importance of minimising experimental errors and obtaining correct error estimates in genetic association studies. [source]


A new perturbation solution for systems with strong quadratic and cubic nonlinearities

MATHEMATICAL METHODS IN THE APPLIED SCIENCES, Issue 6 2010
Mehmet Pakdemirli
Abstract The new perturbation algorithm combining the method of multiple scales (MS) and Lindstedt,Poincare techniques is applied to an equation with quadratic and cubic nonlinearities. Approximate analytical solutions are found using the classical MS method and the new method. Both solutions are contrasted with the direct numerical solutions of the original equation. For the case of strong nonlinearities, solutions of the new method are in good agreement with the numerical results, whereas the amplitude and frequency estimations of classical MS yield high errors. For strongly nonlinear systems, exact periods match well with the new technique while there are large discrepancies between the exact and classical MS periods. Copyright © 2009 John Wiley & Sons, Ltd. [source]