Home About us Contact | |||
Mixture Models (mixture + models)
Kinds of Mixture Models Selected AbstractsUsing Multinomial Mixture Models to Cluster Internet TrafficAUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2004Murray Jorgensen Summary The paper considers the clustering of two large sets of Internet traffic data consisting of information measured from headers of transmission control protocol packets collected on a busy arc of a university network connecting with the Internet. Packets are grouped into 'flows' thought to correspond to particular movements of information between one computer and another. The clustering is based on representing the flows as each sampled from one of a finite number of multinomial distributions and seeks to identify clusters of flows containing similar packet-length distributions. The clustering uses the EM algorithm, and the data-analytic and computational details are given. [source] Finite Mixture Models for Mapping Spatially Dependent Disease CountsBIOMETRICAL JOURNAL, Issue 1 2009Marco Alfó Abstract A vast literature has recently been concerned with the analysis of variation in disease counts recorded across geographical areas with the aim of detecting clusters of regions with homogeneous behavior. Most of the proposed modeling approaches have been discussed for the univariate case and only very recently spatial models have been extended to predict more than one outcome simultaneously. In this paper we extend the standard finite mixture models to the analysis of multiple, spatially correlated, counts. Dependence among outcomes is modeled using a set of correlated random effects and estimation is carried out by numerical integration through an EM algorithm without assuming any specific parametric distribution for the random effects. The spatial structure is captured by the use of a Gibbs representation for the prior probabilities of component membership through a Strauss-like model. The proposed model is illustrated using real data (© 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source] Population Size Estimation Using Individual Level Mixture ModelsBIOMETRICAL JOURNAL, Issue 6 2008Daniel Manrique-Vallier Abstract We revisit the heterogeneous closed population multiple recapture problem, modeling individual-level heterogeneity using the Grade of Membership model (Woodbury et al., 1978). This strategy allows us to postulate the existence of homogeneous latent "ideal" or "pure" classes within the population, and construct a soft clustering of the individuals, where each one is allowed partial or mixed membership in all of these classes. We propose a full hierarchical Bayes specification and a MCMC algorithm to obtain samples from the posterior distribution. We apply the method to simulated data and to three real life examples. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source] Variable Selection for Clustering with Gaussian Mixture ModelsBIOMETRICS, Issue 3 2009Cathy Maugis Summary This article is concerned with variable selection for cluster analysis. The problem is regarded as a model selection problem in the model-based cluster analysis context. A model generalizing the model of Raftery and Dean (2006,,Journal of the American Statistical Association,101, 168,178) is proposed to specify the role of each variable. This model does not need any prior assumptions about the linear link between the selected and discarded variables. Models are compared with Bayesian information criterion. Variable role is obtained through an algorithm embedding two backward stepwise algorithms for variable selection for clustering and linear regression. The model identifiability is established and the consistency of the resulting criterion is proved under regularity conditions. Numerical experiments on simulated datasets and a genomic application highlight the interest of the procedure. [source] On Comparison of Mixture Models for Closed Population Capture,Recapture StudiesBIOMETRICS, Issue 2 2009Chang Xuan Mao Summary A mixture model is a natural choice to deal with individual heterogeneity in capture,recapture studies. Pledger (2000, Biometrics56, 434,442; 2005, Biometrics61, 868,876) advertised the use of the two-point mixture model. Dorazio and Royle (2003, Biometrics59, 351,364; 2005, Biometrics61, 874,876) suggested that the beta-binomial model has advantages. The controversy is related to the nonidentifiability of the population size (Link, 2003, Biometrics59, 1123,1130) and certain boundary problems. The total bias is decomposed into an intrinsic bias, an approximation bias, and an estimation bias. We propose to assess the approximation bias, the estimation bias, and the variance, with the intrinsic bias excluded when comparing different estimators. The boundary problems in both models and their impacts are investigated. Real epidemiological and ecological examples are analyzed. [source] A General Class of Pattern Mixture Models for Nonignorable Dropout with Many Possible Dropout TimesBIOMETRICS, Issue 2 2008Jason Roy Summary In this article we consider the problem of fitting pattern mixture models to longitudinal data when there are many unique dropout times. We propose a marginally specified latent class pattern mixture model. The marginal mean is assumed to follow a generalized linear model, whereas the mean conditional on the latent class and random effects is specified separately. Because the dimension of the parameter vector of interest (the marginal regression coefficients) does not depend on the assumed number of latent classes, we propose to treat the number of latent classes as a random variable. We specify a prior distribution for the number of classes, and calculate (approximate) posterior model probabilities. In order to avoid the complications with implementing a fully Bayesian model, we propose a simple approximation to these posterior probabilities. The ideas are illustrated using data from a longitudinal study of depression in HIV-infected women. [source] Latent Pattern Mixture Models for Informative Intermittent Missing Data in Longitudinal StudiesBIOMETRICS, Issue 2 2004Haiqun Lin Summary. A frequently encountered problem in longitudinal studies is data that are missing due to missed visits or dropouts. In the statistical literature, interest has primarily focused on monotone missing data (dropout) with much less work on intermittent missing data in which a subject may return after one or more missed visits. Intermittent missing data have broader applicability that can include the frequent situation in which subjects do not have common sets of visit times or they visit at nonprescheduled times. In this article, we propose a latent pattern mixture model (LPMM), where the mixture patterns are formed from latent classes that link the longitudinal response and the missingness process. This allows us to handle arbitrary patterns of missing data embodied by subjects' visit process, and avoids the need to specify the mixture patterns a priori. One assumption of our model is that the missingness process is assumed to be conditionally independent of the longitudinal outcomes given the latent classes. We propose a noniterative approach to assess this key assumption. The LPMM is illustrated with a data set from a health service research study in which homeless people with mental illness were randomized to three different service packages and measures of homelessness were recorded at multiple time points. Our model suggests the presence of four latent classes linking subject visit patterns to homeless outcomes. [source] Using Mixtures to Model Heterogeneity in Ecological Capture-Recapture StudiesBIOMETRICAL JOURNAL, Issue 6 2008Shirley Pledger Abstract Modelling heterogeneity of capture is an important problem in estimating animal abundance from capturerecapture data, with underestimation of abundance occurring if different animals have intrinsically high or low capture probabilities. Mixture models are useful in many cases to model the heterogeneity. We summarise mixture model results for closed populations, using a skink data set for illustration. New mixture models for heterogeneous open populations are discussed, and a closed population model is shown to have new and potentially effective applications in community analysis. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source] Physical foundations, models, and methods of diffusion magnetic resonance imaging of the brain: A reviewCONCEPTS IN MAGNETIC RESONANCE, Issue 5 2007Ludovico Minati Abstract The foundations and characteristics of models and methods used in diffusion magnetic resonance imaging, with particular reference to in vivo brain imaging, are reviewed. The first section introduces Fick's laws, propagators, and the relationship between tissue microstructure and the statistical properties of diffusion of water molecules. The second section introduces the diffusion-weighted signal in terms of diffusion of magnetization (Bloch,Torrey equation) and of spin-bearing particles (cumulant expansion). The third section is dedicated to the rank-2 tensor model, the bb -matrix, and the derivation of indexes of anisotropy and shape. The fourth section introduces diffusion in multiple compartments: Gaussian mixture models, relationship between fiber layout, displacement probability and diffusivity, and effect of the b -value. The fifth section is devoted to higher-order generalizations of the tensor model: singular value decompositions (SVD), representation of angular diffusivity patterns and derivation of generalized anisotropy (GA) and scaled entropy (SE), and modeling of non-Gaussian diffusion by means of series expansion of Fick's laws. The sixth section covers spherical harmonic decomposition (SHD) and determination of fiber orientation by means of spherical deconvolution. The seventh section presents the Fourier relationship between signal and displacement probability (Q -space imaging, QSI, or diffusion-spectrum imaging, DSI), and reconstruction of orientation-distribution functions (ODF) by means of the Funk,Radon transform (Q -ball imaging, QBI). © 2007 Wiley Periodicals, Inc. Concepts Magn Reson Part A 30A: 278,307, 2007. [source] Continuous, categorical and mixture models of DSM-IV alcohol and cannabis use disorders in the Australian communityADDICTION, Issue 7 2010Andrew J. Baillie ABSTRACT Aims To apply item response mixture modelling (IRMM) to investigate the viability of the dimensional and categorical approaches to conceptualizing alcohol and cannabis use disorders. Design A cross-sectional survey assessing substance use and DSM-IV substance use disorders. Setting and participants A household survey of a nationally representative sample of 10 641 Australia adults (aged 18 years or older). Measurements Trained survey interviewers administered a structured interview based on the Composite International Diagnostic Interview (CIDI). Findings Of the 10 641 Australian adults interviewed, 7746 had drunk alcohol in the past 12 months and 722 had used cannabis. There was no improvement in fit for categorical latent class nor mixture models combining continuous and categorical parameters compared to continuous factor analysis models. The results indicated that both alcohol and cannabis problems can be considered as dimensional, with those with the disorder arrayed along a dimension of severity. Conclusions A single factor accounts for more variance in the DSM-IV alcohol and cannabis use criteria than latent class or mixture models, so the disorders can be explained most effectively by a dimensional score. [source] Sample Splitting and Threshold EstimationECONOMETRICA, Issue 3 2000Bruce E. Hansen Threshold models have a wide variety of applications in economics. Direct applications include models of separating and multiple equilibria. Other applications include empirical sample splitting when the sample split is based on a continuously-distributed variable such as firm size. In addition, threshold models may be used as a parsimonious strategy for nonparametric function estimation. For example, the threshold autoregressive model (TAR) is popular in the nonlinear time series literature. Threshold models also emerge as special cases of more complex statistical frameworks, such as mixture models, switching models, Markov switching models, and smooth transition threshold models. It may be important to understand the statistical properties of threshold models as a preliminary step in the development of statistical tools to handle these more complicated structures. Despite the large number of potential applications, the statistical theory of threshold estimation is undeveloped. It is known that threshold estimates are super-consistent, but a distribution theory useful for testing and inference has yet to be provided. This paper develops a statistical theory for threshold estimation in the regression context. We allow for either cross-section or time series observations. Least squares estimation of the regression parameters is considered. An asymptotic distribution theory for the regression estimates (the threshold and the regression slopes) is developed. It is found that the distribution of the threshold estimate is nonstandard. A method to construct asymptotic confidence intervals is developed by inverting the likelihood ratio statistic. It is shown that this yields asymptotically conservative confidence regions. Monte Carlo simulations are presented to assess the accuracy of the asymptotic approximations. The empirical relevance of the theory is illustrated through an application to the multiple equilibria growth model of Durlauf and Johnson (1995). [source] Alcohol use trajectories among adults in an urban area after a disaster: evidence from a population-based cohort studyADDICTION, Issue 8 2008Magdalena Cerda ABSTRACT Alcohol use increased in the New York City (NYC) metropolitan area in the first months after the 11 September 2001 terrorist attacks. Aims To investigate alcohol use trajectories in the NYC metropolitan area in the 3 years after 11 September and examine the relative contributions of acute exposure to the attacks and ongoing stressors to these trajectories. Design We used a population-based cohort of adults recruited through a random-digit-dial telephone survey in 2002; participants completed three follow-up interviews over 30 months. Setting The NYC metropolitan area. Participants A total of 2752 non-institutionalized adult residents of NYC. Measurements We used growth mixture models to assess trajectories in levels of total alcohol consumption and bingeing in the past 30 days, and predictors of these trajectories. Findings We identified five trajectories of alcohol consumption levels and three bingeing trajectories. Predictors of higher levels of use over time included ongoing stressors, traumatic events and lower income. Ongoing exposure to stressors and low income also play a central role in bingeing trajectories. Conclusions While point-in-time mass traumatic events may matter in the short term, their contribution subsides over time. Accumulated stressors and traumatic events, in contrast, lead to higher levels of consumption among respondents already vulnerable to high alcohol use. Interventions to mitigate post-disaster stressors may have substantial benefit in reducing alcohol abuse in the medium- to long term. [source] Mixture toxicity and gene inductions: Can we predict the outcome?ENVIRONMENTAL TOXICOLOGY & CHEMISTRY, Issue 3 2008Freddy Dardenne Abstract As a consequence of the nature of most real-life exposure scenarios, the last decade of ecotoxicological research has seen increasing interest in the assessment of mixture ecotoxicology. Often, mixtures are considered to follow one of two models, concentration addition (CA) or response addition (RA), both of which have been described in the literature. Nevertheless, mixtures that deviate from either or both models exist; they typically exhibit phenomena like synergism, ratio or concentration dependency, or inhibition. Moreover, both CA and RA have been challenged and evaluated mainly for acute responses at relatively high levels of biological organization (e.g., whole-organism mortality), and applicability to genetic responses has not received much attention. Genetic responses are considered to be the primary reaction in case of toxicant exposure and carry valuable mechanistic information. Effects at the gene-expression level are at the heart of the mode of action by toxicants and mixtures. The ability to predict mixture responses at this primary response level is an important asset in predicting and understanding mixture effects at different levels of biological organization. The present study evaluated the applicability of mixture models to stress gene inductions in Escherichia coli employing model toxicants with known modes of action in binary combinations. The results showed that even if the maximum of the dose,response curve is not known, making a classical ECx (concentration causing x% effect) approach impossible, mixture models can predict responses to the binary mixtures based on the single-toxicant response curves. In most cases, the mode of action of the toxicants does not determine the optimal choice of model (i.e., CA, RA, or a deviation thereof). [source] Empirical Bayes estimators and non-parametric mixture models for space and time,space disease mapping and surveillanceENVIRONMETRICS, Issue 5 2003Dankmar Böhning Abstract The analysis of the geographic variation of disease and its representation on a map is an important topic in epidemiological research and in public health in general. Identification of spatial heterogeneity of relative risk using morbidity and mortality data is required. Frequently, interest is also in the analysis of space data with respect to time, where typically data are used which are aggregated in certain time windows like 5 or 10 years. The occurrence measure of interest is usually the standardized mortality (morbidity) ratio (SMR). It is well known that disease maps in space or in space and time should not solely be based upon the crude SMR but rather some smoothed version of it. This fact has led to a tremendous amount of theoretical developments in spatial methodology, in particular in the area of hierarchical modeling in connection with fully Bayesian estimation techniques like Markov chain Monte Carlo. It seems, however, that at the same time, where these theoretical developments took place, on the practical side only very few of these developments have found their way into daily practice of epidemiological work and surveillance routines. In this article we focus on developments that avoid the pitfalls of the crude SMR and simultaneously retain a simplicity and, at least approximately, the validity of more complex models. After an illustration of the typical pitfalls of the crude SMR the article is centered around three issues: (a) the separation of spatial random variation from spatial structural variation; (b) a simple mixture model for capturing spatial heterogeneity; (c) an extension of this model for capturing temporal information. The techniques are illustrated by numerous examples. Public domain software like Dismap is mentioned that enables easy mixture modeling in the context of disease mapping. Copyright © 2003 John Wiley & Sons, Ltd. [source] Perspectives on ecological risk assessment of chiral compoundsINTEGRATED ENVIRONMENTAL ASSESSMENT AND MANAGEMENT, Issue 3 2009Jacob K Stanley Abstract Enantiomers of chiral contaminants can significantly differ in environmental fate as well as in effects. Despite this fact, such differences are often ignored in regulation and in practice, injecting uncertainty into the estimation of risk of chiral compounds. We review the unique challenges posed by stereochemistry to the ecological risk assessment of chiral contaminants and existing regulatory guidance for chiral pharmaceuticals and pesticides in the United States. We identify the advantages of obtaining data on fate and effects of each individual enantiomer of chiral contaminants that are either distributed as or may end up as enantiomer mixtures in the environment due to enantiomerization. Because enantiomers of the same compound are highly likely to coexist in the environment with each other and can result in nonadditive effects, we recommend treatment of enantiomers as components of a mixture using widely accepted mixture models from achiral risk assessment. We further propose the enantiomer hazard ratio for retrospectively characterizing relative enantiomer risk and examine uncertainty factor magnitudes for effects analysis. [source] Mixture-based adaptive probabilistic controlINTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 2 2003Miroslav Kárný Abstract Quasi-Bayes algorithm, combined with stabilized forgetting, provides a tool for efficient recursive estimation of dynamic probabilistic mixture models. They can be interpreted either as models of closed-loop with switching modes and controllers or as a universal approximation of a wide class of non-linear control loops. Fully probabilistic control design extended to mixture models makes basis of a powerful class of adaptive controllers based on the receding-horizon certainty equivalence strategy. Paper summarizes the basic elements mentioned above, classifies possible types of control problems and provides solution of the key one referred to as ,simultaneous' design. Results are illustrated on mixtures with components formed by normal auto-regression models with external variable (ARX). Copyright © 2003 John Wiley & Sons, Ltd. [source] Mixture model equations for marker-assisted genetic evaluationJOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 4 2005Y. Liu Summary Marker-assisted genetic evaluation needs to infer genotypes at quantitative trait loci (QTL) based on the information of linked markers. As the inference usually provides the probability distribution of QTL genotypes rather than a specific genotype, marker-assisted genetic evaluation is characterized by the mixture model because of the uncertainty of QTL genotypes. It is, therefore, necessary to develop a statistical procedure useful for mixture model analyses. In this study, a set of mixture model equations was derived based on the normal mixture model and the EM algorithm for evaluating linear models with uncertain independent variables. The derived equations can be seen as an extension of Henderson's mixed model equations to mixture models and provide a general framework to deal with the issues of uncertain incidence matrices in linear models. The mixture model equations were applied to marker-assisted genetic evaluation with different parameterizations of QTL effects. A sire-QTL-effect model and a founder-QTL-effect model were used to illustrate the application of the mixture model equations. The potential advantages of the mixture model equations for marker-assisted genetic evaluation were discussed. The mixed-effect mixture model equations are flexible in modelling QTL effects and show desirable properties in estimating QTL effects, compared with Henderson's mixed model equations. [source] Multidimensional patterns of change in outpatient psychotherapy: The phase model revisitedJOURNAL OF CLINICAL PSYCHOLOGY, Issue 9 2007Niklaus Stulz In this study, groups of psychotherapy outpatients were identified on the basis of shared change patterns in the three dimensions of the phase model of psychotherapeutic outcome: well-being, symptom distress, and life functioning. Treatment courses provided by a national provider network of a managed care company in the United States (N = 1128) were analyzed using growth mixture models. Several initial patient characteristics (treatment expectations, amount of prior psychotherapy, and global assessment of functioning) allowed for the discrimination between three patient groups of shared change patterns. Those patterns can be classified into three groups as phase model consistent, partial rapid responders, or symptomatically highly impaired patients with each having typical change patterns. © 2007 Wiley Periodicals, Inc. J Clin Psychol 63: 817,833, 2007. [source] Sprouting ability across diverse disturbances and vegetation types worldwideJOURNAL OF ECOLOGY, Issue 2 2004Peter A. Vesk Summary 1A widely used classification of plant response to fire divides species into two groups, sprouters and non-sprouters. In contrast, regeneration responses to catastrophic wind throw and small gap disturbance are more often considered a continuum. 2We determined general patterns in the distribution of sprouting ability across species with respect to disturbance type and intensity, vegetation type and phylogeny and assessed the adequacy of a dichotomy for describing species' sprouting responses. These are important steps if sprouting is to be adopted widely and consistently as a functional trait. 3Quantitative data were compiled from the literature and differences in species' sprouting proportions between disturbance classes were assessed using simple sprouting categorizations, visually using histograms and with mixture models. 4The sprouter/non-sprouter dichotomy effectively characterized intense disturbances, such as fires resulting in stem-kill (peaks at 13%, 79% probability of sprouting). But there was a continuum of responses following less intense disturbances. Where substantial above-ground tissue was retained, as for wind throw, localized gap disturbances and low intensity fires, there were fewer non-sprouters and more intermediate sprouters. 5Comparisons across diverse vegetation types and disturbances require quantitative records of sprouting, although the simple sprouter/non-sprouter dichotomy was sufficient for comparisons within fire. Patterns appeared consistent across broad vegetation types. Sprouting ability showed little phylogenetic conservatism. [source] Hybrid Dirichlet mixture models for functional dataJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2009Sonia Petrone Summary., In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. ,damaged' areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: non-homogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with k -levels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to , and their connection with Dirichlet process mixtures. [source] Standard errors for EM estimationJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 2 2000M. Jamshidian The EM algorithm is a popular method for computing maximum likelihood estimates. One of its drawbacks is that it does not produce standard errors as a by-product. We consider obtaining standard errors by numerical differentiation. Two approaches are considered. The first differentiates the Fisher score vector to yield the Hessian of the log-likelihood. The second differentiates the EM operator and uses an identity that relates its derivative to the Hessian of the log-likelihood. The well-known SEM algorithm uses the second approach. We consider three additional algorithms: one that uses the first approach and two that use the second. We evaluate the complexity and precision of these three and the SEM in algorithm seven examples. The first is a single-parameter example used to give insight. The others are three examples in each of two areas of EM application: Poisson mixture models and the estimation of covariance from incomplete data. The examples show that there are algorithms that are much simpler and more accurate than the SEM algorithm. Hopefully their simplicity will increase the availability of standard error estimates in EM applications. It is shown that, as previously conjectured, a symmetry diagnostic can accurately estimate errors arising from numerical differentiation. Some issues related to the speed of the EM algorithm and algorithms that differentiate the EM operator are identified. [source] Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo methodJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 1 2000C. P. Robert Hidden Markov models form an extension of mixture models which provides a flexible class of models exhibiting dependence and a possibly large degree of variability. We show how reversible jump Markov chain Monte Carlo techniques can be used to estimate the parameters as well as the number of components of a hidden Markov model in a Bayesian framework. We employ a mixture of zero-mean normal distributions as our main example and apply this model to three sets of data from finance, meteorology and geomagnetism. [source] A Bayesian model for longitudinal count data with non-ignorable dropoutJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 5 2008Niko A. Kaciroti Summary., Asthma is an important chronic disease of childhood. An intervention programme for managing asthma was designed on principles of self-regulation and was evaluated by a randomized longitudinal study. The study focused on several outcomes, and, typically, missing data remained a pervasive problem. We develop a pattern,mixture model to evaluate the outcome of intervention on the number of hospitalizations with non-ignorable dropouts. Pattern,mixture models are not generally identifiable as no data may be available to estimate a number of model parameters. Sensitivity analyses are performed by imposing structures on the unidentified parameters. We propose a parameterization which permits sensitivity analyses on clustered longitudinal count data that have missing values due to non-ignorable missing data mechanisms. This parameterization is expressed as ratios between event rates across missing data patterns and the observed data pattern and thus measures departures from an ignorable missing data mechanism. Sensitivity analyses are performed within a Bayesian framework by averaging over different prior distributions on the event ratios. This model has the advantage of providing an intuitive and flexible framework for incorporating the uncertainty of the missing data mechanism in the final analysis. [source] Bayesian mixture models for complex high dimensional count data in phage display experimentsJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 2 2007Yuan Ji Summary., Phage display is a biological process that is used to screen random peptide libraries for ligands that bind to a target of interest with high affinity. On the basis of a count data set from an innovative multistage phage display experiment, we propose a class of Bayesian mixture models to cluster peptide counts into three groups that exhibit different display patterns across stages. Among the three groups, the investigators are particularly interested in that with an ascending display pattern in the counts, which implies that the peptides are likely to bind to the target with strong affinity. We apply a Bayesian false discovery rate approach to identify the peptides with the strongest affinity within the group. A list of peptides is obtained, among which important ones with meaningful functions are further validated by biologists. To examine the performance of the Bayesian model, we conduct a simulation study and obtain desirable results. [source] Incorporating gene functional annotations in detecting differential gene expressionJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 3 2006Wei Pan Summary., The importance of incorporating existing biological knowledge, such as gene functional annotations in gene ontology, in analysing high throughput genomic and proteomic data is being increasingly recognized. In the context of detecting differential gene expression, however, the current practice of using gene annotations is limited primarily to validations. Here we take a direct approach to incorporating gene annotations into mixture models for analysis. First, in contrast with a standard mixture model assuming that each gene of the genome has the same distribution, we study stratified mixture models allowing genes with different annotations to have different distributions, such as prior probabilities. Second, rather than treating parameters in stratified mixture models independently, we propose a hierarchical model to take advantage of the hierarchical structure of most gene annotation systems, such as gene ontology. We consider a simplified implementation for the proof of concept. An application to a mouse microarray data set and a simulation study demonstrate the improvement of the two new approaches over the standard mixture model. [source] Maximum likelihood estimation of bivariate logistic models for incomplete responses with indicators of ignorable and non-ignorable missingnessJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 3 2002Nicholas J. Horton Summary. Missing observations are a common problem that complicate the analysis of clustered data. In the Connecticut child surveys of childhood psychopathology, it was possible to identify reasons why outcomes were not observed. Of note, some of these causes of missingness may be assumed to be ignorable, whereas others may be non-ignorable. We consider logistic regression models for incomplete bivariate binary outcomes and propose mixture models that permit estimation assuming that there are two distinct types of missingness mechanisms: one that is ignorable; the other non-ignorable. A feature of the mixture modelling approach is that additional analyses to assess the sensitivity to assumptions about the missingness are relatively straightforward to incorporate. The methods were developed for analysing data from the Connecticut child surveys, where there are missing informant reports of child psychopathology and different reasons for missingness can be distinguished. [source] Empirical Modeling of Butyl Acrylate/Vinyl Acetate/Acrylic Acid Emulsion-Based Pressure-Sensitive AdhesivesMACROMOLECULAR MATERIALS & ENGINEERING, Issue 5 2004Renata Jovanovic Abstract Summary: Butyl acrylate/vinyl acetate/acrylic acid (BA/VAc/AA) emulsion latexes were produced in a semi-batch mode. The objective was to generate polymers with properties favoring their application as pressure-sensitive adhesives. The influence of the individual monomer concentrations on final properties such as glass transition temperature (Tg), peel strength, shear strength and tack was investigated. To obtain the maximum amount of information in a reasonable number of runs, a constrained three-component mixture design was used to define the experimental conditions. Latexes were coated onto a polyethylene terephthalate carrier and dried. Different empirical models (e.g. linear, quadratic and cubic mixture models) governing the individual properties (i.e. Tg, peel adhesion, shear resistance and tack) were developed and evaluated. In the given experimental region, no single model was found to fit all of the responses (i.e. the final properties). However, in all models the most significant factor affecting the final properties was the AA concentration, followed by the VAc concentration. Shear strength contour lines over the investigated region. [source] Capture,recapture models with heterogeneity to study survival senescence in the wildOIKOS, Issue 3 2010Guillaume Péron Detecting senescence in wild populations and estimating its strength raise three challenges. First, in the presence of individual heterogeneity in survival probability, the proportion of high-survival individuals increases with age. This increase can mask a senescence-related decrease in survival probability when the probability is estimated at the population level. To accommodate individual heterogeneity we use a mixture model structure (discrete classes of individuals). Second, the study individuals can elude the observers in the field, and their detection rate can be heterogeneous. To account for detectability issues we use capture,mark,recapture (CMR) methodology, mixture models and data that provide information on individuals' detectability. Last, emigration to non-monitored sites can bias survival estimates, because it can occur at the end of the individuals' histories and mimic earlier death. To model emigration we use Markovian transitions to and from an unobservable state. These different model structures are merged together using hidden Markov chain CMR models, or multievent models. Simulation studies illustrate that reliable evidence for survival senescence can be obtained using highly heterogeneous data from non site-faithful individuals. We then design a tailored application for a dataset from a colony of black-headed gull Chroicocephalus ridibundus. Survival probabilities do not appear individually variable, but evidence for survival senescence becomes significant only when accounting for other sources of heterogeneity. This result suggests that not accounting for heterogeneity leads to flawed inference and/or that emigration heterogeneity mimics survival heterogeneity and biases senescence estimates. [source] Peak quantification in surface-enhanced laser desorption/ionization by using mixture modelsPROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 19 2006Martijn Dijkstra Abstract Surface-enhanced laser desorption/ionization (SELDI) time of flight (TOF) is a mass spectrometry technology for measuring the composition of a sampled protein mixture. A mass spectrum contains peaks corresponding to proteins in the sample. The peak areas are proportional to the measured concentrations of the corresponding proteins. Quantifying peak areas is difficult for existing methods because peak shapes are not constant across a spectrum and because peaks often overlap. We present a new method for quantifying peak areas. Our method decomposes a spectrum into peaks and a baseline using so-called statistical finite mixture models. We illustrate our method in detail on 8 samples from culture media of adipose tissue and globally on 64 samples from serum to compare our method to the standard Ciphergen method. Both methods give similar estimates for singleton peaks, but not for overlapping peaks. The Ciphergen method overestimates the heights of such peaks while our method still gives appropriate estimates. Peak quantification is an important step in pre-processing SELDI-TOF data and improvements therein will pay off in the later biomarker discovery phase. [source] The likelihood ratio test for homogeneity in finite mixture modelsTHE CANADIAN JOURNAL OF STATISTICS, Issue 2 2001Hanfeng Chen Abstract The authors study the asymptotic behaviour of the likelihood ratio statistic for testing homogeneity in the finite mixture models of a general parametric distribution family. They prove that the limiting distribution of this statistic is the squared supremum of a truncated standard Gaussian process. The autocorrelation function of the Gaussian process is explicitly presented. A re-sampling procedure is recommended to obtain the asymptotic p -value. Three kernel functions, normal, binomial and Poisson, are used in a simulation study which illustrates the procedure. [source] |