Multinomial Distribution (multinomial + distribution)

Distribution by Scientific Domains


Selected Abstracts


An empirical method for inferring species richness from samples

ENVIRONMETRICS, Issue 2 2006
Paul A. Murtaugh
Abstract We introduce an empirical method of estimating the number of species in a community based on a random sample. The numbers of sampled individuals of different species are modeled as a multinomial random vector, with cell probabilities estimated by the relative abundances of species in the sample and, for hypothetical species missing from the sample, by linear extrapolation from the abundance of the rarest observed species. Inference is then based on likelihoods derived from the multinomial distribution, conditioning on a range of possible values of the true richness in the community. The method is shown to work well in simulations based on a variety of real data sets. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Uncovering a Latent Multinomial: Analysis of Mark,Recapture Data with Misidentification

BIOMETRICS, Issue 1 2010
William A. Link
Summary Natural tags based on DNA fingerprints or natural features of animals are now becoming very widely used in wildlife population biology. However, classic capture,recapture models do not allow for misidentification of animals which is a potentially very serious problem with natural tags. Statistical analysis of misidentification processes is extremely difficult using traditional likelihood methods but is easily handled using Bayesian methods. We present a general framework for Bayesian analysis of categorical data arising from a latent multinomial distribution. Although our work is motivated by a specific model for misidentification in closed population capture,recapture analyses, with crucial assumptions which may not always be appropriate, the methods we develop extend naturally to a variety of other models with similar structure. Suppose that observed frequencies,f,are a known linear transformation,f=A,x,of a latent multinomial variable,x,with cell probability vector,,=,(,). Given that full conditional distributions,[, | x],can be sampled, implementation of Gibbs sampling requires only that we can sample from the full conditional distribution,[x | f, ,], which is made possible by knowledge of the null space of A,. We illustrate the approach using two data sets with individual misidentification, one simulated, the other summarizing recapture data for salamanders based on natural marks. [source]


Bayesian Shrinkage Estimation of the Relative Abundance of mRNA Transcripts Using SAGE

BIOMETRICS, Issue 3 2003
Jeffrey S. Morris
Summary. Serial analysis of gene expression (SAGE) is a technology for quantifying gene expression in biological tissue that yields count data that can be modeled by a multinomial distribution with two characteristics: skewness in the relative frequencies and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample may fail to capture a large number of expressed mRNA species present in the tissue. Empirical estimators of mRNA species' relative abundance effectively ignore these missing species, and as a result tend to overestimate the abundance of the scarce observed species comprising a vast majority of the total. We have developed a new Bayesian estimation procedure that quantifies our prior information about these characteristics, yielding a nonlinear shrinkage estimator with efficiency advantages over the MLE. Our prior is mixture of Dirichlets, whereby species are stochastically partitioned into abundant and scarce classes, each with its own multivariate prior. Simulation studies reveal our estimator has lower integrated mean squared error (IMSE) than the MLE for the SAGE scenarios simulated, and yields relative abundance profiles closer in Euclidean distance to the truth for all samples simulated. We apply our method to a SAGE library of normal colon tissue, and discuss its implications for assessing differential expression. [source]


Profile-Likelihood Inference for Highly Accurate Diagnostic Tests

BIOMETRICS, Issue 4 2002
John V. Tsimikas
Summary. We consider profile-likelihood inference based on the multinomial distribution for assessing the accuracy of a diagnostic test. The methods apply to ordinal rating data when accuracy is assessed using the area under the receiver operating characteristic (ROC) curve. Simulation results suggest that the derived confidence intervals have acceptable coverage probabilities, even when sample sizes are small and the diagnostic tests have high accuracies. The methods extend to stratified settings and situations in which the ratings are correlated. We illustrate the methods using data from a clinical trial on the detection of ovarian cancer. [source]


Discrete dynamic Bayesian network analysis of fMRI data

HUMAN BRAIN MAPPING, Issue 1 2009
John Burge
Abstract We examine the efficacy of using discrete Dynamic Bayesian Networks (dDBNs), a data-driven modeling technique employed in machine learning, to identify functional correlations among neuroanatomical regions of interest. Unlike many neuroimaging analysis techniques, this method is not limited by linear and/or Gaussian noise assumptions. It achieves this by modeling the time series of neuroanatomical regions as discrete, as opposed to continuous, random variables with multinomial distributions. We demonstrated this method using an fMRI dataset collected from healthy and demented elderly subjects (Buckner, et al., 2000: J Cogn Neurosci 12:24-34) and identify correlates based on a diagnosis of dementia. The results are validated in three ways. First, the elicited correlates are shown to be robust over leave-one-out cross-validation and, via a Fourier bootstrapping method, that they were not likely due to random chance. Second, the dDBNs identified correlates that would be expected given the experimental paradigm. Third, the dDBN's ability to predict dementia is competitive with two commonly employed machine-learning classifiers: the support vector machine and the Gaussian naïve Bayesian network. We also verify that the dDBN selects correlates based on non-linear criteria. Finally, we provide a brief analysis of the correlates elicited from Buckner et al.'s data that suggests that demented elderly subjects have reduced involvement of entorhinal and occipital cortex and greater involvement of the parietal lobe and amygdala in brain activity compared with healthy elderly (as measured via functional correlations among BOLD measurements). Limitations and extensions to the dDBN method are discussed. Hum Brain Mapp, 2009. © 2007 Wiley-Liss, Inc. [source]


Using Multinomial Mixture Models to Cluster Internet Traffic

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2004
Murray Jorgensen
Summary The paper considers the clustering of two large sets of Internet traffic data consisting of information measured from headers of transmission control protocol packets collected on a busy arc of a university network connecting with the Internet. Packets are grouped into 'flows' thought to correspond to particular movements of information between one computer and another. The clustering is based on representing the flows as each sampled from one of a finite number of multinomial distributions and seeks to identify clusters of flows containing similar packet-length distributions. The clustering uses the EM algorithm, and the data-analytic and computational details are given. [source]