Mixture Distributions (mixture + distribution)

Distribution by Scientific Domains


Selected Abstracts


SCALE MIXTURES DISTRIBUTIONS IN STATISTICAL MODELLING

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2008
S.T. Boris Choy
Summary This paper presents two types of symmetric scale mixture probability distributions which include the normal, Student t, Pearson Type VII, variance gamma, exponential power, uniform power and generalized t (GT) distributions. Expressing a symmetric distribution into a scale mixture form enables efficient Bayesian Markov chain Monte Carlo (MCMC) algorithms in the implementation of complicated statistical models. Moreover, the mixing parameters, a by-product of the scale mixture representation, can be used to identify possible outliers. This paper also proposes a uniform scale mixture representation for the GT density, and demonstrates how this density representation alleviates the computational burden of the Gibbs sampler. [source]


Modeling and Forecasting Realized Volatility

ECONOMETRICA, Issue 2 2003
Torben G. Andersen
We provide a framework for integration of high,frequency intraday data into the measurement, modeling, and forecasting of daily and lower frequency return volatilities and return distributions. Building on the theory of continuous,time arbitrage,free price processes and the theory of quadratic variation, we develop formal links between realized volatility and the conditional covariance matrix. Next, using continuously recorded observations for the Deutschemark/Dollar and Yen/Dollar spot exchange rates, we find that forecasts from a simple long,memory Gaussian vector autoregression for the logarithmic daily realized volatilities perform admirably. Moreover, the vector autoregressive volatility forecast, coupled with a parametric lognormal,normal mixture distribution produces well,calibrated density forecasts of future returns, and correspondingly accurate quantile predictions. Our results hold promise for practical modeling and forecasting of the large covariance matrices relevant in asset pricing, asset allocation, and financial risk management applications. [source]


The multi-clump finite mixture distribution and model selection

ENVIRONMETRICS, Issue 2 2010
Sudhir R. Paul
Abstract In practical data analysis, often an important problem is to determine the number of clumps in discrete data in the form of proportions. This can be done through model selection in a multi-clump finite mixture model. In this paper, we propose bootstrap likelihood ratio tests to test the fit of a multinomial model against the single clump finite mixture distribution and to determine the number of clumps in the data, that is, to select a model with appropriate number of clumps. Shortcomings of some traditional large sample procedures are also shown. Three datasets are analyzed. Copyright © 2009 John Wiley & Sons, Ltd. [source]


INAR(1) modeling of overdispersed count series with an environmental application

ENVIRONMETRICS, Issue 4 2008
Harry Pavlopoulos
Abstract This paper is concerned with a novel version of the INAR(1) model, a non-linear auto-regressive Markov chain on ,, with innovations following a finite mixture distribution of Poisson laws. For , the stationary marginal probability distribution of the chain is overdispersed relative to a Poisson, thus making INAR(1) suitable for modeling time series of counts with arbitrary overdispersion. The one-step transition probability function of the chain is also a finite mixture, of m Poisson-Binomial laws, facilitating likelihood-based inference for model parameters. An explicit EM-algorithm is devised for inference by maximization of a conditional likelihood. Alternative options for inference are discussed along with criteria for selecting m. Integer-valued prediction (IP) is developed by a parametric bootstrap approach to ,coherent' forecasting, and a certain test statistic based on predictions is introduced for assessing performance of the fitted model. The proposed model is fitted to time series of counts of pixels where spatially averaged rain rate exceeds a given threshold level, illustrating its capabilities in challenging cases of highly overdispersed count data. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Incorporating covariates in mapping heterogeneous traits: a hierarchical model using empirical Bayes estimation

GENETIC EPIDEMIOLOGY, Issue 7 2007
Swati Biswas
Abstract Complex genetic traits are inherently heterogeneous, i.e., they may be caused by different genes, or non-genetic factors, in different individuals. So, for mapping genes responsible for these diseases using linkage analysis, heterogeneity must be accounted for in the model. Heterogeneity across different families can be modeled using a mixture distribution by letting each family have its own heterogeneity parameter denoting the probability that its disease-causing gene is linked to the marker map under consideration. A substantial gain in power is expected if covariates that can discriminate between the families of linked and unlinked types are incorporated in this modeling framework. To this end, we propose a hierarchical Bayesian model, in which the families are grouped according to various (categorized) levels of covariate(s). The heterogeneity parameters of families within each group are assigned a common prior, whose parameters are further assigned hyper-priors. The hyper-parameters are obtained by utilizing the empirical Bayes estimates. We also address related issues such as evaluating whether the covariate(s) under consideration are informative and grouping of families. We compare the proposed approach with one that does not utilize covariates and show that our approach leads to considerable gains in power to detect linkage and in precision of interval estimates through various simulation scenarios. An application to the asthma datasets of Genetic Analysis Workshop 12 also illustrates this gain in a real data analysis. Additionally, we compare the performances of microsatellite markers and single nucleotide polymorphisms for our approach and find that the latter clearly outperforms the former. Genet. Epidemiol. 2007. © 2007 Wiley-Liss, Inc. [source]


A Note on Comparing Exposure Data to a Regulatory Limit in the Presence of Unexposed and a Limit of Detection

BIOMETRICAL JOURNAL, Issue 6 2005
Haitao Chu
Abstract In some occupational health studies, observations occur in both exposed and unexposed individuals. If the levels of all exposed individuals have been detected, a two-part zero-inflated log-normal model is usually recommended, which assumes that the data has a probability mass at zero for unexposed individuals and a continuous response for values greater than zero for exposed individuals. However, many quantitative exposure measurements are subject to left censoring due to values falling below assay detection limits. A zero-inflated log-normal mixture model is suggested in this situation since unexposed zeros are not distinguishable from those exposed with values below detection limits. In the context of this mixture distribution, the information contributed by values falling below a fixed detection limit is used only to estimate the probability of unexposed. We consider sample size and statistical power calculation when comparing the median of exposed measurements to a regulatory limit. We calculate the required sample size for the data presented in a recent paper comparing the benzene TWA exposure data to a regulatory occupational exposure limit. A simulation study is conducted to investigate the performance of the proposed sample size calculation methods. (© 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Inference for two-stage adaptive treatment strategies using mixture distributions

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 1 2010
Abdus S. Wahed
Summary., Treatment of complex diseases such as cancer, leukaemia, acquired immune deficiency syndrome and depression usually follows complex treatment regimes consisting of time varying multiple courses of the same or different treatments. The goal is to achieve the largest overall benefit defined by a common end point such as survival. Adaptive treatment strategy refers to a sequence of treatments that are applied at different stages of therapy based on the individual's history of covariates and intermediate responses to the earlier treatments. However, in many cases treatment assignment depends only on intermediate response and prior treatments. Clinical trials are often designed to compare two or more adaptive treatment strategies. A common approach that is used in these trials is sequential randomization. Patients are randomized on entry into available first-stage treatments and then on the basis of the response to the initial treatments are randomized to second-stage treatments, and so on. The analysis often ignores this feature of randomization and frequently conducts separate analysis for each stage. Recent literature suggested several semiparametric and Bayesian methods for inference related to adaptive treatment strategies from sequentially randomized trials. We develop a parametric approach using mixture distributions to model the survival times under different adaptive treatment strategies. We show that the estimators proposed are asymptotically unbiased and can be easily implemented by using existing routines in statistical software packages. [source]


Correlating two continuous variables subject to detection limits in the context of mixture distributions

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 5 2005
Haitao Chu
Summary., In individuals who are infected with human immunodeficiency virus (HIV), distributions of quantitative HIV ribonucleic acid measurements may be highly left censored with an extra spike below the limit of detection LD of the assay. A two-component mixture model with the lower component entirely supported on [0, LD] is recommended to model the extra spike in univariate analysis better. Let LD1 and LD2 be the limits of detection for the two HIV viral load measurements. When estimating the correlation coefficient between two different measures of viral load obtained from each of a sample of patients, a bivariate Gaussian mixture model is recommended to model the extra spike on [0, LD1] and [0, LD2] better when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution. When the proportion of both variables falling below LD is very large, the parameters of the lower component may not be estimable since almost all observations from the lower component are falling below LD. A partial solution is to assume that the lower component's entire support is on [0, LD1]×[0, LD2]. Maximum likelihood is used to estimate the parameters of the lower and higher components. To evaluate whether there is a lower component, we apply a Monte Carlo approach to assess the p -value of the likelihood ratio test and two information criteria: a bootstrap-based information criterion and a cross-validation-based information criterion. We provide simulation results to evaluate the performance and compare it with two ad hoc estimators and a single-component bivariate Gaussian likelihood estimator. These methods are applied to the data from a cohort study of HIV-infected men in Rio de Janeiro, Brazil, and the data from the Women's Interagency HIV oral study. These results emphasize the need for caution when estimating correlation coefficients from data with a large proportion of non-detectable values when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution. [source]


Within-individual discrimination on the Concealed Information Test using dynamic mixture modeling

PSYCHOPHYSIOLOGY, Issue 2 2009
Izumi Matsuda
Abstract Whether an examinee has information about a crime is determined by the Concealed Information Test based on autonomic differences between the crime-related item and other control items. Multivariate quantitative statistical methods have been proposed for this determination. However, these require specific databases of responses, which are problematic for field application. Alternative methods, using only an individual's data, are preferable, but traditionally such within-individual approaches have limitations because of small data sample size. The present study proposes a new within-individual judgment method, the hidden Markov discrimination method, in which time series-data are modeled with dynamic mixture distributions. This method was applied to experimental data and showed sufficient potential in discriminating guilty from innocent examinees in a mock theft experiment compared with performance of previous methods. [source]


A fast distance-based approach for determining the number of components in mixtures

THE CANADIAN JOURNAL OF STATISTICS, Issue 1 2003
Sujit K. Sahu
Abstract The authors propose a procedure for determining the unknown number of components in mixtures by generalizing a Bayesian testing method proposed by Mengersen & Robert (1996). The testing criterion they propose involves a Kullback-Leibler distance, which may be weighted or not. They give explicit formulas for the weighted distance for a number of mixture distributions and propose a stepwise testing procedure to select the minimum number of components adequate for the data. Their procedure, which is implemented using the BUGS software, exploits a fast collapsing approach which accelerates the search for the minimum number of components by avoiding full refitting at each step. The performance of their method is compared, using both distances, to the Bayes factor approach. Les auteurs proposentune une façon de déterminer le nombre inconnu de composantes d'un mélange grâce à une généralisation d'un test bayésien de Mengersen & Robert (1996). Le critère qu'ils proposent repose sur une distance de Kullback-Leibler, laquelle peut être pondérée ou non. Ils calculent la distance pondérée explicitement pour divers mélanges de lois et proposent une procédure de test pas-à-pas conduisant au choix du plus petit nombre de composantes fournissant un bon ajustement. Leur procédure, implantée au moyen du logiciel BUGS, exploite la notion d'emboîtement pour accélérer les calculs nécessaires à ce choix en évitant qu'ils soient entièrement repris à chaque étape. La performance de leur technique est comparée à celle basée sur la notion de facteur de Bayes. [source]


Site Occupancy Models with Heterogeneous Detection Probabilities

BIOMETRICS, Issue 1 2006
J. Andrew Royle
Summary Models for estimating the probability of occurrence of a species in the presence of imperfect detection are important in many ecological disciplines. In these "site occupancy" models, the possibility of heterogeneity in detection probabilities among sites must be considered because variation in abundance (and other factors) among sampled sites induces variation in detection probability (p). In this article, I develop occurrence probability models that allow for heterogeneous detection probabilities by considering several common classes of mixture distributions for p. For any mixing distribution, the likelihood has the general form of a zero-inflated binomial mixture for which inference based upon integrated likelihood is straightforward. A recent paper by Link (2003, Biometrics59, 1123,1130) demonstrates that in closed population models used for estimating population size, different classes of mixture distributions are indistinguishable from data, yet can produce very different inferences about population size. I demonstrate that this problem can also arise in models for estimating site occupancy in the presence of heterogeneous detection probabilities. The implications of this are discussed in the context of an application to avian survey data and the development of animal monitoring programs. [source]