EM Algorithm (em + algorithm)

Distribution by Scientific Domains
Distribution within Mathematics and Statistics


Selected Abstracts


Maximum Likelihood Estimation of VARMA Models Using a State-Space EM Algorithm

JOURNAL OF TIME SERIES ANALYSIS, Issue 5 2007
Konstantinos Metaxoglou
Abstract., We introduce a state-space representation for vector autoregressive moving-average models that enables maximum likelihood estimation using the EM algorithm. We obtain closed-form expressions for both the E- and M-steps; the former requires the Kalman filter and a fixed-interval smoother, and the latter requires least squares-type regression. We show via simulations that our algorithm converges reliably to the maximum, whereas gradient-based methods often fail because of the highly nonlinear nature of the likelihood function. Moreover, our algorithm converges in a smaller number of function evaluations than commonly used direct-search routines. Overall, our approach achieves its largest performance gains when applied to models of high dimension. We illustrate our technique by estimating a high-dimensional vector moving-average model for an efficiency test of California's wholesale electricity market. [source]


A Version of the EM Algorithm for Proportional Hazard Model with Random Effects

BIOMETRICAL JOURNAL, Issue 6 2005
José Cortiñas Abrahantes
Abstract Proportional hazard models with multivariate random effects (frailties) acting multiplicatively on the baseline hazard have recently become a topic of an intensive research. One of the main practical problems related to the models is the estimation of parameters. To this aim, several approaches based on the EM algorithm have been proposed. The major difference between these approaches is the method of the computation of conditional expectations required at the E-step. In this paper an alternative implementation of the EM algorithm is proposed, in which the expected values are computed with the use of the Laplace approximation. The method is computationally less demanding than the approaches developed previously. Its performance is assessed based on a simulation study and compared to a non-EM based estimation approach proposed by Ripatti and Palmgren (2000). (© 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Presence-Only Data and the EM Algorithm

BIOMETRICS, Issue 2 2009
Gill Ward
Summary In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation,maximization algorithm to estimate the underlying presence,absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence,absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided. [source]


Estimating common trends in multivariate time series using dynamic factor analysis

ENVIRONMETRICS, Issue 7 2003
A. F. Zuur
Abstract This article discusses dynamic factor analysis, a technique for estimating common trends in multivariate time series. Unlike more common time series techniques such as spectral analysis and ARIMA models, dynamic factor analysis can analyse short, non-stationary time series containing missing values. Typically, the parameters in dynamic factor analysis are estimated by direct optimization, which means that only small data sets can be analysed if computing time is not to become prohibitively long and the chances of obtaining sub-optimal estimates are to be avoided. This article shows how the parameters of dynamic factor analysis can be estimated using the EM algorithm, allowing larger data sets to be analysed. The technique is illustrated on a marine environmental data set. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Haplotype association analysis for late onset diseases using nuclear family data

GENETIC EPIDEMIOLOGY, Issue 3 2006
Chun Li
Abstract In haplotype-based association studies for late onset diseases, one attractive design is to use available unaffected spouses as controls (Valle et al. [1998] Diab. Care 21:949,958). Given cases and spouses only, the standard expectation-maximization (EM) algorithm (Dempster et al. [1977] J. R. Stat. Soc. B 39:1,38) for case-control data can be used to estimate haplotype frequencies. But often we will have offspring for at least some of the spouse pairs, and offspring genotypes provide additional information about the haplotypes of the parents. Existing methods may either ignore the offspring information, or reconstruct haplotypes for the subjects using offspring information and discard data from those whose haplotypes cannot be reconstructed with high confidence. Neither of these approaches is efficient, and the latter approach may also be biased. For case-control data with some subjects forming spouse pairs and offspring genotypes available for some spouse pairs or individuals, we propose a unified, likelihood-based method of haplotype inference. The method makes use of available offspring genotype information to apportion ambiguous haplotypes for the subjects. For subjects without offspring genotype information, haplotypes are apportioned as in the standard EM algorithm for case-control data. Our method enables efficient haplotype frequency estimation using an EM algorithm and supports probabilistic haplotype reconstruction with the probability calculated based on the whole sample. We describe likelihood ratio and permutation tests to test for disease-haplotype association, and describe three test statistics that are potentially useful for detecting such an association. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


An EM-like reconstruction method for diffuse optical tomography

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 9 2010
*Article first published online: 28 JUN 2010, Caifang Wang
Abstract Diffuse optical tomography (DOT) is an optical imaging modality which provides the spatial distributions of optical parameters inside an object. The forward model of DOT is described by the diffusion approximation of radiative transfer equation, while the DOT is to reconstruct the optical parameters from boundary measurements. In this paper, an EM-like iterative reconstruction method specifically for the steady state DOT problem is developed. Previous iterative reconstruction methods are mostly based on the assumption that the measurement noise is Gaussian, and are of least-squares type. In this paper, with the assumption that the boundary measurements have independent and identical Poisson distributions, the inverse problem of DOT is solved by maximizing a log-likelihood functional with inequality constraints, and then an EM-like reconstruction algorithm is developed according to the Kuhn,Tucker condition. The proposed algorithm is a variant of the well-known EM algorithm. The performance of the proposed algorithm is tested with three-dimensional numerical simulation. Copyright © 2010 John Wiley & Sons, Ltd. [source]


Mixture model equations for marker-assisted genetic evaluation

JOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 4 2005
Y. Liu
Summary Marker-assisted genetic evaluation needs to infer genotypes at quantitative trait loci (QTL) based on the information of linked markers. As the inference usually provides the probability distribution of QTL genotypes rather than a specific genotype, marker-assisted genetic evaluation is characterized by the mixture model because of the uncertainty of QTL genotypes. It is, therefore, necessary to develop a statistical procedure useful for mixture model analyses. In this study, a set of mixture model equations was derived based on the normal mixture model and the EM algorithm for evaluating linear models with uncertain independent variables. The derived equations can be seen as an extension of Henderson's mixed model equations to mixture models and provide a general framework to deal with the issues of uncertain incidence matrices in linear models. The mixture model equations were applied to marker-assisted genetic evaluation with different parameterizations of QTL effects. A sire-QTL-effect model and a founder-QTL-effect model were used to illustrate the application of the mixture model equations. The potential advantages of the mixture model equations for marker-assisted genetic evaluation were discussed. The mixed-effect mixture model equations are flexible in modelling QTL effects and show desirable properties in estimating QTL effects, compared with Henderson's mixed model equations. [source]


Robust identification of piecewise/switching autoregressive exogenous process

AICHE JOURNAL, Issue 7 2010
Xing Jin
Abstract A robust identification approach for a class of switching processes named PWARX (piecewise autoregressive exogenous) processes is developed in this article. It is proposed that the identification problem can be formulated and solved within the EM (expectation-maximization) algorithm framework. However, unlike the regular EM algorithm in which the objective function of the maximization step is built upon the assumption that the noise comes from a single distribution, contaminated Gaussian distribution is utilized in the process of constructing the objective function, which effectively makes the revised EM algorithm robust to the latent outliers. Issues associated with the EM algorithm in the PWARX system identification such as sensitivity to its starting point as well as inability to accurately classify "un-decidable" data points are examined and a solution strategy is proposed. Data sets with/without outliers are both considered and the performance is compared between the robust EM algorithm and regular EM algorithm in terms of their parameter estimation performance. Finally, a modified version of MRLP (multi-category robust linear programming) region partition method is proposed by assigning different weights to different data points. In this way, negative influence caused by outliers could be minimized in region partitioning of PWARX systems. Simulation as well as application on a pilot-scale switched process control system are used to verify the efficiency of the proposed identification algorithm. © 2009 American Institute of Chemical Engineers AIChE J, 2010 [source]


A latent Markov model for detecting patterns of criminal activity

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 1 2007
Francesco Bartolucci
Summary., The paper investigates the problem of determining patterns of criminal behaviour from official criminal histories, concentrating on the variety and type of offending convictions. The analysis is carried out on the basis of a multivariate latent Markov model which allows for discrete covariates affecting the initial and the transition probabilities of the latent process. We also show some simplifications which reduce the number of parameters substantially; we include a Rasch-like parameterization of the conditional distribution of the response variables given the latent process and a constraint of partial homogeneity of the latent Markov chain. For the maximum likelihood estimation of the model we outline an EM algorithm based on recursions known in the hidden Markov literature, which make the estimation feasible also when the number of time occasions is large. Through this model, we analyse the conviction histories of a cohort of offenders who were born in England and Wales in 1953. The final model identifies five latent classes and specifies common transition probabilities for males and females between 5-year age periods, but with different initial probabilities. [source]


On-line expectation,maximization algorithm for latent data models

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2009
Olivier Cappé
Summary., We propose a generic on-line (also sometimes called adaptive or recursive) version of the expectation,maximization (EM) algorithm applicable to latent variable models of independent observations. Compared with the algorithm of Titterington, this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete-data distribution. The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback,Leibler divergence between the marginal distribution of the observation and the model distribution at the optimal rate, i.e. that of the maximum likelihood estimator. In addition, the approach proposed is also suitable for conditional (or regression) models, as illustrated in the case of the mixture of linear regressions model. [source]


Likelihood inference for a class of latent Markov models under linear hypotheses on the transition probabilities

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 2 2006
Francesco Bartolucci
Summary., For a class of latent Markov models for discrete variables having a longitudinal structure, we introduce an approach for formulating and testing linear hypotheses on the transition probabilities of the latent process. For the maximum likelihood estimation of a latent Markov model under hypotheses of this type, we outline an EM algorithm that is based on well-known recursions in the hidden Markov literature. We also show that, under certain assumptions, the asymptotic null distribution of the likelihood ratio statistic for testing a linear hypothesis on the transition probabilities of a latent Markov model, against a less stringent linear hypothesis on the transition probabilities of the same model, is of type. As a particular case, we derive the asymptotic distribution of the likelihood ratio statistic between a latent class model and its latent Markov version, which may be used to test the hypothesis of absence of transition between latent states. The approach is illustrated through a series of simulations and two applications, the first of which is based on educational testing data that have been collected within the National Assessment of Educational Progress 1996, and the second on data, concerning the use of marijuana, which have been collected within the National Youth Survey 1976,1980. [source]


Standard errors for EM estimation

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 2 2000
M. Jamshidian
The EM algorithm is a popular method for computing maximum likelihood estimates. One of its drawbacks is that it does not produce standard errors as a by-product. We consider obtaining standard errors by numerical differentiation. Two approaches are considered. The first differentiates the Fisher score vector to yield the Hessian of the log-likelihood. The second differentiates the EM operator and uses an identity that relates its derivative to the Hessian of the log-likelihood. The well-known SEM algorithm uses the second approach. We consider three additional algorithms: one that uses the first approach and two that use the second. We evaluate the complexity and precision of these three and the SEM in algorithm seven examples. The first is a single-parameter example used to give insight. The others are three examples in each of two areas of EM application: Poisson mixture models and the estimation of covariance from incomplete data. The examples show that there are algorithms that are much simpler and more accurate than the SEM algorithm. Hopefully their simplicity will increase the availability of standard error estimates in EM applications. It is shown that, as previously conjectured, a symmetry diagnostic can accurately estimate errors arising from numerical differentiation. Some issues related to the speed of the EM algorithm and algorithms that differentiate the EM operator are identified. [source]


Modelling longitudinal semicontinuous emesis volume data with serial correlation in an acupuncture clinical trial

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 4 2005
Paul S. Albert
Summary., In longitudinal studies, we are often interested in modelling repeated assessments of volume over time. Our motivating example is an acupuncture clinical trial in which we compare the effects of active acupuncture, sham acupuncture and standard medical care on chemotherapy-induced nausea in patients being treated for advanced stage breast cancer. An important end point for this study was the daily measurement of the volume of emesis over a 14-day follow-up period. The repeated volume data contained many 0s, had apparent serial correlation and had missing observations, making analysis challenging. The paper proposes a two-part latent process model for analysing the emesis volume data which addresses these challenges. We propose a Monte Carlo EM algorithm for parameter estimation and we use this methodology to show the beneficial effects of acupuncture on reducing the volume of emesis in women being treated for breast cancer with chemotherapy. Through simulations, we demonstrate the importance of correctly modelling the serial correlation for making conditional inference. Further, we show that the correct model for the correlation structure is less important for making correct inference on marginal means. [source]


Maximum Likelihood Estimation of VARMA Models Using a State-Space EM Algorithm

JOURNAL OF TIME SERIES ANALYSIS, Issue 5 2007
Konstantinos Metaxoglou
Abstract., We introduce a state-space representation for vector autoregressive moving-average models that enables maximum likelihood estimation using the EM algorithm. We obtain closed-form expressions for both the E- and M-steps; the former requires the Kalman filter and a fixed-interval smoother, and the latter requires least squares-type regression. We show via simulations that our algorithm converges reliably to the maximum, whereas gradient-based methods often fail because of the highly nonlinear nature of the likelihood function. Moreover, our algorithm converges in a smaller number of function evaluations than commonly used direct-search routines. Overall, our approach achieves its largest performance gains when applied to models of high dimension. We illustrate our technique by estimating a high-dimensional vector moving-average model for an efficiency test of California's wholesale electricity market. [source]


Two-part regression models for longitudinal zero-inflated count data

THE CANADIAN JOURNAL OF STATISTICS, Issue 2 2010
Marco Alfò
Abstract Two-part models are quite well established in the economic literature, since they resemble accurately a principal-agent type model, where homogeneous, observable, counted outcomes are subject to a (prior, exogenous) selection choice. The first decision can be represented by a binary choice model, modeled using a probit or a logit link; the second can be analyzed through a truncated discrete distribution such as a truncated Poisson, negative binomial, and so on. Only recently, a particular attention has been devoted to the extension of two-part models to handle longitudinal data. The authors discuss a semi-parametric estimation method for dynamic two-part models and propose a comparison with other, well-established alternatives. Heterogeneity sources that influence the first level decision process, that is, the decision to use a certain service, are assumed to influence also the (truncated) distribution of the positive outcomes. Estimation is carried out through an EM algorithm without parametric assumptions on the random effects distribution. Furthermore, the authors investigate the extension of the finite mixture representation to allow for unobservable transition between components in each of these parts. The proposed models are discussed using empirical as well as simulated data. The Canadian Journal of Statistics 38: 197,216; 2010 © 2010 Statistical Society of Canada Les modèles en deux parties sont bien établis dans la littérature économique puisqu'ils sont très similaires à un modèle principal-agent pour lequel les résultats homogènes, observables et dénombrables sont sujets à un critère de sélection (exogène et a priori). La première décision est représentée à l'aide un modèle de choix binaire et une fonction de lien probit ou logit tandis que la seconde peut être analysée à l'aide d'une loi discrète tronquée telle que la loi de Poisson tronquée, la loi binomiale négative, etc. Depuis peu, une attention particulière a été portée à la généralisation du modèle en deux parties pour prendre en compte les données longitudinales. Les auteurs présentent une méthode d'estimation semi-paramétrique pour les modèles en deux parties dynamiques et ils les comparent avec d'autres modèles alternatifs bien connus. Les sources hétérogènes qui influencent le premier niveau du processus de décision, c'est-à-dire la décision d'utiliser un certain service, sont censées influencer aussi la distribution (tronquée) des résultats positifs. L'estimation est faite à l'aide de l'algorithme EM sans présupposés paramétriques sur la distribution des effets aléatoires. De plus, les auteurs considèrent une généralisation à une représentation en mélange fini afin de permettre une transition non observable entre les différentes composantes de chacune des parties. Une discussion est faite sur les modèles présentés en utilisant des données empiriques ou simulées. La revue canadienne de statistique 38: 197,216; 2010 © 2010 Société statistique du Canada [source]


Moment based regression algorithms for drift and volatility estimation in continuous-time Markov switching models

THE ECONOMETRICS JOURNAL, Issue 2 2008
Robert J. Elliott
Summary, We consider a continuous time Markov switching model (MSM) which is widely used in mathematical finance. The aim is to estimate the parameters given observations in discrete time. Since there is no finite dimensional filter for estimating the underlying state of the MSM, it is not possible to compute numerically the maximum likelihood parameter estimate via the well known expectation maximization (EM) algorithm. Therefore in this paper, we propose a method of moments based parameter estimator. The moments of the observed process are computed explicitly as a function of the time discretization interval of the discrete time observation process. We then propose two algorithms for parameter estimation of the MSM. The first algorithm is based on a least-squares fit to the exact moments over different time lags, while the second algorithm is based on estimating the coefficients of the expansion (with respect to time) of the moments. Extensive numerical results comparing the algorithm with the EM algorithm for the discretized model are presented. [source]


Efficient Calculation of P-value and Power for Quadratic Form Statistics in Multilocus Association Testing

ANNALS OF HUMAN GENETICS, Issue 3 2010
Liping Tong
Summary We address the asymptotic and approximate distributions of a large class of test statistics with quadratic forms used in association studies. The statistics of interest take the general form D=XTA X, where A is a general similarity matrix which may or may not be positive semi-definite, and X follows the multivariate normal distribution with mean , and variance matrix ,, where , may or may not be singular. We show that D can be written as a linear combination of independent ,2 random variables with a shift. Furthermore, its distribution can be approximated by a ,2 or the difference of two ,2 distributions. In the setting of association testing, our methods are especially useful in two situations. First, when the required significance level is much smaller than 0.05 such as in a genome scan, the estimation of p-values using permutation procedures can be challenging. Second, when an EM algorithm is required to infer haplotype frequencies from un-phased genotype data, the computation can be intensive for a permutation procedure. In either situation, an efficient and accurate estimation procedure would be useful. Our method can be applied to any quadratic form statistic and therefore should be of general interest. [source]


Wavelets in state space models

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 3 2003
Eliana Zandonade
Abstract In this paper, we consider the utilization of wavelets in conjunction with state space models. Specifically, the parameters in the system matrix are expanded in wavelet series and estimated via the Kalman Filter and the EM algorithm. In particular this approach is used for switching models. Two applications are given, one to the problem of detecting the paths of targets using an array of sensors, and the other to a series of daily spreads between two Brazilian bonds. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Using Multinomial Mixture Models to Cluster Internet Traffic

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2004
Murray Jorgensen
Summary The paper considers the clustering of two large sets of Internet traffic data consisting of information measured from headers of transmission control protocol packets collected on a busy arc of a university network connecting with the Internet. Packets are grouped into 'flows' thought to correspond to particular movements of information between one computer and another. The clustering is based on representing the flows as each sampled from one of a finite number of multinomial distributions and seeks to identify clusters of flows containing similar packet-length distributions. The clustering uses the EM algorithm, and the data-analytic and computational details are given. [source]


Robust Joint Modeling of Longitudinal Measurements and Competing Risks Failure Time Data

BIOMETRICAL JOURNAL, Issue 1 2009
Ning Li
Abstract Existing methods for joint modeling of longitudinal measurements and survival data can be highly influenced by outliers in the longitudinal outcome. We propose a joint model for analysis of longitudinal measurements and competing risks failure time data which is robust in the presence of outlying longitudinal observations during follow-up. Our model consists of a linear mixed effects sub-model for the longitudinal outcome and a proportional cause-specific hazards frailty sub-model for the competing risks data, linked together by latent random effects. Instead of the usual normality assumption for measurement errors in the linear mixed effects sub-model, we adopt a t -distribution which has a longer tail and thus is more robust to outliers. We derive an EM algorithm for the maximum likelihood estimates of the parameters and estimate their standard errors using a profile likelihood method. The proposed method is evaluated by simulation studies and is applied to a scleroderma lung study (© 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Finite Mixture Models for Mapping Spatially Dependent Disease Counts

BIOMETRICAL JOURNAL, Issue 1 2009
Marco Alfó
Abstract A vast literature has recently been concerned with the analysis of variation in disease counts recorded across geographical areas with the aim of detecting clusters of regions with homogeneous behavior. Most of the proposed modeling approaches have been discussed for the univariate case and only very recently spatial models have been extended to predict more than one outcome simultaneously. In this paper we extend the standard finite mixture models to the analysis of multiple, spatially correlated, counts. Dependence among outcomes is modeled using a set of correlated random effects and estimation is carried out by numerical integration through an EM algorithm without assuming any specific parametric distribution for the random effects. The spatial structure is captured by the use of a Gibbs representation for the prior probabilities of component membership through a Strauss-like model. The proposed model is illustrated using real data (© 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


An Adaptive Single-step FDR Procedure with Applications to DNA Microarray Analysis

BIOMETRICAL JOURNAL, Issue 1 2007
Vishwanath Iyer
Abstract The use of multiple hypothesis testing procedures has been receiving a lot of attention recently by statisticians in DNA microarray analysis. The traditional FWER controlling procedures are not very useful in this situation since the experiments are exploratory by nature and researchers are more interested in controlling the rate of false positives rather than controlling the probability of making a single erroneous decision. This has led to increased use of FDR (False Discovery Rate) controlling procedures. Genovese and Wasserman proposed a single-step FDR procedure that is an asymptotic approximation to the original Benjamini and Hochberg stepwise procedure. In this paper, we modify the Genovese-Wasserman procedure to force the FDR control closer to the level alpha in the independence setting. Assuming that the data comes from a mixture of two normals, we also propose to make this procedure adaptive by first estimating the parameters using the EM algorithm and then using these estimated parameters into the above modification of the Genovese-Wasserman procedure. We compare this procedure with the original Benjamini-Hochberg and the SAM thresholding procedures. The FDR control and other properties of this adaptive procedure are verified numerically. (© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


A Version of the EM Algorithm for Proportional Hazard Model with Random Effects

BIOMETRICAL JOURNAL, Issue 6 2005
José Cortiñas Abrahantes
Abstract Proportional hazard models with multivariate random effects (frailties) acting multiplicatively on the baseline hazard have recently become a topic of an intensive research. One of the main practical problems related to the models is the estimation of parameters. To this aim, several approaches based on the EM algorithm have been proposed. The major difference between these approaches is the method of the computation of conditional expectations required at the E-step. In this paper an alternative implementation of the EM algorithm is proposed, in which the expected values are computed with the use of the Laplace approximation. The method is computationally less demanding than the approaches developed previously. Its performance is assessed based on a simulation study and compared to a non-EM based estimation approach proposed by Ripatti and Palmgren (2000). (© 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Joint Inference on HIV Viral Dynamics and Immune Suppression in Presence of Measurement Errors

BIOMETRICS, Issue 2 2010
L. Wu
Summary:, In an attempt to provide a tool to assess antiretroviral therapy and to monitor disease progression, this article studies association of human immunodeficiency virus (HIV) viral suppression and immune restoration. The data from a recent acquired immune deficiency syndrome (AIDS) study are used for illustration. We jointly model HIV viral dynamics and time to decrease in CD4/CD8 ratio in the presence of CD4 process with measurement errors, and estimate the model parameters simultaneously via a method based on a Laplace approximation and the commonly used Monte Carlo EM algorithm. The approaches and many of the points presented apply generally. [source]


Haplotype-Based Regression Analysis and Inference of Case,Control Studies with Unphased Genotypes and Measurement Errors in Environmental Exposures

BIOMETRICS, Issue 3 2008
Iryna Lobach
Summary It is widely believed that risks of many complex diseases are determined by genetic susceptibilities, environmental exposures, and their interaction. Chatterjee and Carroll (2005, Biometrika92, 399,418) developed an efficient retrospective maximum-likelihood method for analysis of case,control studies that exploits an assumption of gene,environment independence and leaves the distribution of the environmental covariates to be completely nonparametric. Spinka, Carroll, and Chatterjee (2005, Genetic Epidemiology29, 108,127) extended this approach to studies where certain types of genetic information, such as haplotype phases, may be missing on some subjects. We further extend this approach to situations when some of the environmental exposures are measured with error. Using a polychotomous logistic regression model, we allow disease status to have K+ 1 levels. We propose use of a pseudolikelihood and a related EM algorithm for parameter estimation. We prove consistency and derive the resulting asymptotic covariance matrix of parameter estimates when the variance of the measurement error is known and when it is estimated using replications. Inferences with measurement error corrections are complicated by the fact that the Wald test often behaves poorly in the presence of large amounts of measurement error. The likelihood-ratio (LR) techniques are known to be a good alternative. However, the LR tests are not technically correct in this setting because the likelihood function is based on an incorrect model, i.e., a prospective model in a retrospective sampling scheme. We corrected standard asymptotic results to account for the fact that the LR test is based on a likelihood-type function. The performance of the proposed method is illustrated using simulation studies emphasizing the case when genetic information is in the form of haplotypes and missing data arises from haplotype-phase ambiguity. An application of our method is illustrated using a population-based case,control study of the association between calcium intake and the risk of colorectal adenoma. [source]


Analysis of Matched Case,Control Data in Presence of Nonignorable Missing Exposure

BIOMETRICS, Issue 1 2008
Samiran Sinha
Summary. The present article deals with informative missing (IM) exposure data in matched case,control studies. When the missingness mechanism depends on the unobserved exposure values, modeling the missing data mechanism is inevitable. Therefore, a full likelihood-based approach for handling IM data has been proposed by positing a model for selection probability, and a parametric model for the partially missing exposure variable among the control population along with a disease risk model. We develop an EM algorithm to estimate the model parameters. Three special cases: (a) binary exposure variable, (b) normally distributed exposure variable, and (c) lognormally distributed exposure variable are discussed in detail. The method is illustrated by analyzing a real matched case,control data with missing exposure variable. The performance of the proposed method is evaluated through simulation studies, and the robustness of the proposed method for violation of different types of model assumptions has been considered. [source]


Functional Hierarchical Models for Identifying Genes with Different Time-Course Expression Profiles

BIOMETRICS, Issue 2 2006
F. Hong
Summary Time-course studies of gene expression are essential in biomedical research to understand biological phenomena that evolve in a temporal fashion. We introduce a functional hierarchical model for detecting temporally differentially expressed (TDE) genes between two experimental conditions for cross-sectional designs, where the gene expression profiles are treated as functional data and modeled by basis function expansions. A Monte Carlo EM algorithm was developed for estimating both the gene-specific parameters and the hyperparameters in the second level of modeling. We use a direct posterior probability approach to bound the rate of false discovery at a pre-specified level and evaluate the methods by simulations and application to microarray time-course gene expression data on Caenorhabditis elegans developmental processes. Simulation results suggested that the procedure performs better than the two-way ANOVA in identifying TDE genes, resulting in both higher sensitivity and specificity. Genes identified from the C. elegans developmental data set show clear patterns of changes between the two experimental conditions. [source]


Small-Sample Inference for Incomplete Longitudinal Data with Truncation and Censoring in Tumor Xenograft Models

BIOMETRICS, Issue 3 2002
Ming Tan
Summary. In cancer drug development, demonstrating activity in xenograft models, where mice are grafted with human cancer cells, is an important step in bringing a promising compound to humans. A key outcome variable is the tumor volume measured in a given period of time for groups of mice given different doses of a single or combination anticancer regimen. However, a mouse may die before the end of a study or may be sacrificed when its tumor volume quadruples, and its tumor may be suppressed for some time and then grow back. Thus, incomplete repeated measurements arise. The incompleteness or missingness is also caused by drastic tumor shrinkage (<0.01 cm3) or random truncation. Because of the small sample sizes in these models, asymptotic inferences are usually not appropriate. We propose two parametric test procedures based on the EM algorithm and the Bayesian method to compare treatment effects among different groups while accounting for informative censoring. A real xenograft study on a new antitumor agent, temozolomide, combined with irinotecan is analyzed using the proposed methods. [source]


Estimating the Frequency Distribution of Crossovers during Meiosis from Recombination Data

BIOMETRICS, Issue 2 2001
Kai Yu
Summary. Estimation of tetrad crossover frequency distributions from genetic recombination data is a classic problem dating back to Weinstein (1936, Genetics21, 155,199). But a number of important issues, such as how to specify the maximum number of crossovers, how to construct confidence intervals for crossover probabilities, and how to obtain correct p -values for hypothesis tests, have never been adequately addressed. In this article, we obtain some properties of the maximum likelihood estimate (MLE) for crossover probabilities that imply guidelines for choosing the maximum number of crossovers. We give these results for both normal meiosis and meiosis with nondisjunction. We also develop an accelerated EM algorithm to find the MLE more efficiently. We propose bootstrap-based methods to find confidence intervals and p -values and conduct simulation studies to check the validity of the bootstrap approach. [source]


Standard Errors for EM Estimates in Generalized Linear Models with Random Effects

BIOMETRICS, Issue 3 2000
Herwig Friedl
Summary. A procedure is derived for computing standard errors of EM estimates in generalized linear models with random effects. Quadrature formulas are used to approximate the integrals in the EM algorithm, where two different approaches are pursued, i.e., Gauss-Hermite quadrature in the case of Gaussian random effects and nonparametric maximum likelihood estimation for an unspecified random effect distribution. An approximation of the expected Fisher information matrix is derived from an expansion of the EM estimating equations. This allows for inferential arguments based on EM estimates, as demonstrated by an example and simulations. [source]