Home About us Contact

Simulated Data Sets (simulated + data_set)

Distribution by Scientific Domains

Life Sciences	28%
Medical Sciences	19%
Mathematics and Statistics	19%
Engineering	14%

Selected Abstracts

Using neutral landscapes to identify patterns of aggregation across resource points

ECOGRAPHY, Issue 3 2006
Jill Lancaster
Many organisms are aggregated within resource patches and aggregated spatially across landscapes with multiple resources. Such patchy distributions underpin models of population regulation and species coexistence, so ecologists require methods to analyse spatially-explicit data of resource distribution and use. I describe a method for analysing maps of resources and testing hypotheses about how resource distribution influences the distribution of organisms, where resource patches can be described as points in a landscape and the number of organisms on each resource point is known. Using a mark correlation function and the linearised form of Ripley's K-function, this version of marked point pattern analysis can characterise and test hypotheses about the spatial distribution of organisms (marks) on resource patches (points). The method extends a version of point pattern analysis that has wide ecological applicability, it can describe patterns over a range of scales, and can detect mixed patterns. Statistically, Monte Carlo permutations are used to estimate the difference between the observed and expected values of the mark correlation function. Hypothesis testing employs a flexible neutral landscape approach in which spatial characteristics of point patterns are preserved to some extent, and marks are randomised across points. I describe the steps required to identify the appropriate neutral landscape and apply the analysis. Simulated data sets illustrate how the choice of neutral landscape can influence ecological interpretations, and how this spatially-explicit method and traditional dispersion indices can yield different interpretations. Interpretations may be general or context-sensitive, depending on information available about the underlying point pattern and the neutral landscape. An empirical example of caterpillars exploiting food plants illustrates how this technique might be used to test hypotheses about adult oviposition and larval dispersal. This approach can increase the value of survey data, by making it possible to quantify the distribution of resource points in the landscape and the pattern of resource use by species. [source]

Improved inter-modality image registration using normalized mutual information with coarse-binned histograms

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING, Issue 6 2009
Haewon Nam
Abstract In this paper we extend the method of inter-modality image registration using the maximization of normalized mutual information (NMI) for the registration of [18F]-2-fluoro-deoxy-D-glucose (FDG)-positron emission tomography (PET) with T1-weighted magnetic resonance (MR) volumes. We investigate the impact on the NMI maximization with respect to using coarse-to-fine grained B-spline bases and to the number of bins required for the voxel intensity histograms of each volume. Our results demonstrate that the efficiency and accuracy of elastic, as well as rigid body, registration is improved both through the use of a reduced number of bins in the PET and MR histograms, and of a limited coarse-to-fine grain interpolation of the volume data. To determine the appropriate number of bins prior to registration, we consider the NMI between the two volumes, the mutual information content of the two volumes, as a function of the binning of each volume. Simulated data sets are used for validation and the registration improves that obtained with a standard approach based on the Statistical Parametric Mapping software. Copyright © 2008 John Wiley & Sons, Ltd. [source]

Speed Estimation from Single Loop Data Using an Unscented Particle Filter

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 7 2010
Zhirui Ye
The Kalman filters used in past speed estimation studies employ a Gaussian assumption that is hardly satisfied. The hybrid method that combines a parametric filter (Unscented Kalman Filter) and a nonparametric filter (Particle Filter) is thus proposed to overcome the limitations of the existing methods. To illustrate the advantage of the proposed approach, two data sets collected from field detectors along with a simulated data set are utilized for performance evaluation and comparison with the Extended Kalman Filter and the Unscented Kalman Filter. It is found that the proposed method outperforms the evaluated Kalman filter methods. The UPF method produces accurate speed estimation even for congested flow conditions in which many other methods have significant accuracy problems. [source]

Testing the group polarization hypothesis by using logit models

EUROPEAN JOURNAL OF SOCIAL PSYCHOLOGY, Issue 1 2002
María F. Rodrigo
This paper focuses on methodological aspects of group polarization research and has two well-defined parts. The first part presents a methodological overview of group polarization research together with an examination of the inadequacy, under certain circumstances, of the traditional parametric approach usually used to test this phenomenon based on pre-test/post-test means comparison across groups. It is shown that this approach will produce masks effects when groups are heterogeneous with regard to the observed change from pre-test to post-test. The second part suggests an alternative methodological approach based on logit models for the analysis of contingency tables from a categorization of the variable ,kind of shift'. This approach is illustrated and compared with the parametric approach with a simulated data set. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Genome-wide association analyses of quantitative traits: the GAW16 experience

GENETIC EPIDEMIOLOGY, Issue S1 2009
Saurabh GhoshArticle first published online: 18 NOV 200
Abstract The group that formed on the theme of genome-wide association analyses of quantitative traits (Group 2) in the Genetic Analysis Workshop 16 comprised eight sets of investigators. Three data sets were available: one on autoantibodies related to rheumatoid arthritis provided by the North American Rheumatoid Arthritis Consortium; the second on anthropometric, lipid, and biochemical measures provided by the Framingham Heart Study (FHS); and the third a simulated data set modeled after FHS. The different investigators in the group addressed a large set of statistical challenges and applied a wide spectrum of association methods in analyzing quantitative traits at the genome-wide level. While some previously reported genes were validated, some novel chromosomal regions provided significant evidence of association in multiple contributions in the group. In this report, we discuss the different strategies explored by the different investigators with the common goal of improving the power to detect association. Genet. Epidemiol. 33 (Suppl. 1):S13,S18, 2009. © 2009 Wiley-Liss, Inc. [source]

Tikhonov regularization in standardized and general form for multivariate calibration with application towards removing unwanted spectral artifacts

JOURNAL OF CHEMOMETRICS, Issue 1-2 2006
Forrest Stout
Abstract Tikhonov regularization (TR) is an approach to form a multivariate calibration model for y,=,Xb. It includes a regulation operator matrix L that is usually set to the identity matrix I and in this situation, TR is said to operate in standard form and is the same as ridge regression (RR). Alternatively, TR can function in general form with L,,,I where L is used to remove unwanted spectral artifacts. To simplify the computations for TR in general form, a standardization process can be used on X and y to transform the problem into TR in standard form and a RR algorithm can now be used. The calculated regression vector in standardized space must be back-transformed to the general form which can now be applied to spectra that have not been standardized. The calibration model building methods of principal component regression (PCR), partial least squares (PLS) and others can also be implemented with the standardized X and y. Regardless of the calibration method, armed with y, X and L, a regression vector is sought that can correct for irrelevant spectral variation in predicting y. In this study, L is set to various derivative operators to obtain smoothed TR, PCR and PLS regression vectors in order to generate models robust to noise and/or temperature effects. Results of this smoothing process are examined for spectral data without excessive noise or other artifacts, spectral data with additional noise added and spectral data exhibiting temperature-induced peak shifts. When the noise level is small, derivative operator smoothing was found to slightly degrade the root mean square error of validation (RMSEV) as well as the prediction variance indicator represented by the regression vector 2-norm thereby deteriorating the model harmony (bias/variance tradeoff). The effective rank (ER) (parsimony) was found to decrease with smoothing and in doing so; a harmony/parsimony tradeoff is formed. For the temperature-affected data and some of the noisy data, derivative operator smoothing decreases the RMSEV, but at a cost of greater values for . The ER was found to increase and hence, the parsimony degraded. A simulated data set from a previous study that used TR in general form was reexamined. In the present study, the standardization process is used with L set to the spectral noise structure to eliminate undesirable spectral regions (wavelength selection) and TR, PCR and PLS are evaluated. There was a significant decrease in bias at a sacrifice to variance with wavelength selection and the parsimony essentially remains the same. This paper includes discussion on the utility of using TR to remove other undesired spectral patterns resulting from chemical, environmental and/or instrumental influences. The discussion also incorporates using TR as a method for calibration transfer. Copyright © 2006 John Wiley & Sons, Ltd. [source]

Bayesian conformational analysis of ring molecules through reversible jump MCMC

JOURNAL OF CHEMOMETRICS, Issue 8 2005
Kim Nolsøe
Abstract In this paper, we address the problem of classifying the conformations of m -membered rings using experimental observations obtained by crystal structure analysis. We formulate a model for the data generation mechanism that consists in a multidimensional mixture model. We perform inference for the proportions and the components in a Bayesian framework, implementing a Markov chain Monte Carlo (MCMC) reversible jump algorithm to obtain samples of the posterior distributions. The method is illustrated on a simulated data set and on real data corresponding to cyclo-octane structures. Copyright © 2005 John Wiley & Sons, Ltd. [source]

Identifying the time of polynomial drift in the mean of autocorrelated processes

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 5 2010
Marcus B. Perry
Abstract Control charts are used to detect changes in a process. Once a change is detected, knowledge of the change point would simplify the search for and identification of the special ause. Consequently, having an estimate of the process change point following a control chart signal would be useful to process engineers. This paper addresses change point estimation for covariance-stationary autocorrelated processes where the mean drifts deterministically with time. For example, the mean of a chemical process might drift linearly over time as a result of a constant pressure leak. The goal of this paper is to derive and evaluate an MLE for the time of polynomial drift in the mean of autocorrelated processes. It is assumed that the behavior in the process mean over time is adequately modeled by the kth-order polynomial trend model. Further, it is assumed that the autocorrelation structure is adequately modeled by the general (stationary and invertible) mixed autoregressive-moving-average model. The estimator is intended to be applied to data obtained following a genuine control chart signal in efforts to help pinpoint the root cause of process change. Application of the estimator is demonstrated using a simulated data set. The performance of the estimator is evaluated through Monte Carlo simulation studies for the k=1 case and across several processes yielding various levels of positive autocorrelation. Results suggest that the proposed estimator provides process engineers with an accurate and useful estimate for the last sample obtained from the unchanged process. Copyright © 2009 John Wiley & Sons, Ltd. [source]

The promise of geometric morphometrics

AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, Issue S35 2002
Joan T. Richtsmeier
Abstract Nontraditional or geometric morphometric methods have found wide application in the biological sciences, especially in anthropology, a field with a strong history of measurement of biological form. Controversy has arisen over which method is the "best" for quantifying the morphological difference between forms and for making proper statistical statements about the detected differences. This paper explains that many of these arguments are superfluous to the real issues that need to be understood by those wishing to apply morphometric methods to biological data. Validity, the ability of a method to find the correct answer, is rarely discussed and often ignored. We explain why demonstration of validity is a necessary step in the evaluation of methods used in morphometrics. Focusing specifically on landmark data, we discuss the concepts of size and shape, and reiterate that since no unique definition of size exists, shape can only be recognized with reference to a chosen surrogate for size. We explain why only a limited class of information related to the morphology of an object can be known when landmark data are used. This observation has genuine consequences, as certain morphometric methods are based on models that require specific assumptions, some of which exceed what can be known from landmark data. We show that orientation of an object with reference to other objects in a sample can never be known, because this information is not included in landmark data. Consequently, a descriptor of form difference that contains information on orientation is flawed because that information does not arise from evidence within the data, but instead is a product of a chosen orientation scheme. To illustrate these points, we apply superimposition, deformation, and linear distance-based morphometric methods to the analysis of a simulated data set for which the true differences are known. This analysis demonstrates the relative efficacy of various methods to reveal the true difference between forms. Our discussion is intended to be fair, but it will be obvious to the reader that we favor a particular approach. Our bias comes from the realization that morphometric methods should operate with a definition of form and form difference consistent with the limited class of information that can be known from landmark data. Answers based on information that can be known from the data are of more use to biological inquiry than those based on unjustifiable assumptions. Yrbk Phys Anthropol 45:63,91, 2002. © 2002 Wiley-Liss, Inc. [source]

Nonparametric Estimation for the Three-Stage Irreversible Illness,Death Model

BIOMETRICS, Issue 3 2000
Somnath Datta
Summary. In this paper, we present new nonparametric estimators of the stage-occupation probabilities in the three-stage irreversible illness-death model. These estimators use a fractional risk set and a reweighting approach and are valid under stage-dependent censoring. Using a simulated data set, we compare the behavior of our estimators with previously proposed estimators. We also apply our estimators to data on time to Pneumocystis pneumonia and death obtained from an AIDS cohort study. [source]

Multiple genetic tests for susceptibility to smoking do not outperform simple family history

ADDICTION, Issue 1 2009
Coral E. Gartner
ABSTRACT Aims To evaluate the utility of using predictive genetic screening of the population for susceptibility to smoking. Methods The results of meta-analyses of genetic association studies of smoking behaviour were used to create simulated data sets using Monte Carlo methods. The ability of the genetic tests to screen for smoking was assessed using receiver operator characteristic curve analysis. The result was compared to prediction using simple family history information. To identify the circumstances in which predictive genetic testing would potentially justify screening we simulated tests using larger numbers of alleles (10, 15 and 20) that varied in prevalence from 10 to 50% and in strength of association [relative risks (RRs) of 1.2,2.1]. Results A test based on the RRs and prevalence of five susceptibility alleles derived from meta-analyses of genetic association studies of smoking performed similarly to chance and no better than the prediction based on simple family history. Increasing the number of alleles from five to 20 improved the predictive ability of genetic screening only modestly when using genes with the effect sizes reported to date. Conclusions This panel of genetic tests would be unsuitable for population screening. This situation is unlikely to be improved upon by screening based on more genetic tests. Given the similarity with associations found for other polygenic conditions, our results also suggest that using multiple genes to screen the general population for genetic susceptibility to polygenic disorders will be of limited utility. [source]

Variable smoothing in Bayesian intrinsic autoregressions

ENVIRONMETRICS, Issue 8 2007
Mark J. Brewer
Abstract We introduce an adapted form of the Markov random field (MRF) for Bayesian spatial smoothing with small-area data. This new scheme allows the amount of smoothing to vary in different parts of a map by employing area-specific smoothing parameters, related to the variance of the MRF. We take an empirical Bayes approach, using variance information from a standard MRF analysis to provide prior information for the smoothing parameters of the adapted MRF. The scheme is shown to produce proper posterior distributions for a broad class of models. We test our method on both simulated and real data sets, and for the simulated data sets, the new scheme is found to improve modelling of both slowly-varying levels of smoothness and discontinuities in the response surface. Copyright © 2007 John Wiley & Sons, Ltd. [source]

Maximum likelihood estimators of population parameters from doubly left-censored samples

ENVIRONMETRICS, Issue 8 2006
Abou El-Makarim A. Aboueissa
Abstract Left-censored data often arise in environmental contexts with one or more detection limits, DLs. Estimators of the parameters are derived for left-censored data having two detection limits: DL1 and DL2 assuming an underlying normal distribution. Two different approaches for calculating the maximum likelihood estimates (MLE) are given and examined. These methods also apply to lognormally distributed environmental data with two distinct detection limits. The performance of the new estimators is compared utilizing many simulated data sets. Examples are given illustrating the use of these methods utilizing a computer program given in the Appendix. Copyright © 2006 John Wiley & Sons, Ltd. [source]

Maximum-likelihood estimation of haplotype frequencies in nuclear families

GENETIC EPIDEMIOLOGY, Issue 1 2004
Tim Becker
Abstract The importance of haplotype analysis in the context of association fine mapping of disease genes has grown steadily over the last years. Since experimental methods to determine haplotypes on a large scale are not available, phase has to be inferred statistically. For individual genotype data, several reconstruction techniques and many implementations of the expectation-maximization (EM) algorithm for haplotype frequency estimation exist. Recent research work has shown that incorporating available genotype information of related individuals largely increases the precision of haplotype frequency estimates. We, therefore, implemented a highly flexible program written in C, called FAMHAP, which calculates maximum likelihood estimates (MLEs) of haplotype frequencies from general nuclear families with an arbitrary number of children via the EM-algorithm for up to 20 SNPs. For more loci, we have implemented a locus-iterative mode of the EM-algorithm, which gives reliable approximations of the MLEs for up to 63 SNP loci, or less when multi-allelic markers are incorporated into the analysis. Missing genotypes can be handled as well. The program is able to distinguish cases (haplotypes transmitted to the first affected child of a family) from pseudo-controls (non-transmitted haplotypes with respect to the child). We tested the performance of FAMHAP and the accuracy of the obtained haplotype frequencies on a variety of simulated data sets. The implementation proved to work well when many markers were considered and no significant differences between the estimates obtained with the usual EM-algorithm and those obtained in its locus-iterative mode were observed. We conclude from the simulations that the accuracy of haplotype frequency estimation and reconstruction in nuclear families is very reliable in general and robust against missing genotypes. © 2004 Wiley-Liss, Inc. [source]

Fine mapping and detection of the causative mutation underlying Quantitative Trait Loci

JOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 5 2010
E. Uleberg
Summary The effect on power and precision of including the causative SNP amongst the investigated markers in Quantitative Trait Loci (QTL) mapping experiments was investigated. Three fine mapping methods were tested to see which was most efficient in finding the causative mutation: combined linkage and linkage disequilibrium mapping (LLD); association mapping (MARK); a combination of LLD and association mapping (LLDMARK). Two simulated data sets were analysed: in one set, the causative SNP was included amongst the markers, while in the other set the causative SNP was masked between markers. Including the causative SNP amongst the markers increased both precision and power in the analyses. For the LLD method the number of correctly positioned QTL increased from 17 for the analysis without the causative SNP to 77 for the analysis including the causative SNP. The likelihood of the data analysis increased from 3.4 to 13.3 likelihood units for the MARK method when the causative SNP was included. When the causative SNP was masked between the analysed markers, the LLD method was most efficient in detecting the correct QTL position, while the MARK method was most efficient when the causative SNP was included as a marker in the analysis. The LLDMARK method, combining association mapping and LLD, assumes a QTL as the null hypothesis (using LLD method) and tests whether the ,putative causative SNP' explains significantly more variance than a QTL in the region. Thus, if the putative causative SNP does not only give an Identical-By-Descent (IBD) signal, but also an Alike-In-State (AIS) signal, LLDMARK gives a positive likelihood ratio. LLDMARK detected less than half as many causative SNPs as the other methods, and also had a relatively high false discovery rate when the QTL effect was large. LLDMARK may however be more robust against spurious associations, because the regional IBD is largely corrected for by fitting a QTL effect in the null hypothesis model. [source]

Bayesian inference in a piecewise Weibull proportional hazards model with unknown change points

JOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 4 2007
J. Casellas
Summary The main difference between parametric and non-parametric survival analyses relies on model flexibility. Parametric models have been suggested as preferable because of their lower programming needs although they generally suffer from a reduced flexibility to fit field data. In this sense, parametric survival functions can be redefined as piecewise survival functions whose slopes change at given points. It substantially increases the flexibility of the parametric survival model. Unfortunately, we lack accurate methods to establish a required number of change points and their position within the time space. In this study, a Weibull survival model with a piecewise baseline hazard function was developed, with change points included as unknown parameters in the model. Concretely, a Weibull log-normal animal frailty model was assumed, and it was solved with a Bayesian approach. The required fully conditional posterior distributions were derived. During the sampling process, all the parameters in the model were updated using a Metropolis,Hastings step, with the exception of the genetic variance that was updated with a standard Gibbs sampler. This methodology was tested with simulated data sets, each one analysed through several models with different number of change points. The models were compared with the Deviance Information Criterion, with appealing results. Simulation results showed that the estimated marginal posterior distributions covered well and placed high density to the true parameter values used in the simulation data. Moreover, results showed that the piecewise baseline hazard function could appropriately fit survival data, as well as other smooth distributions, with a reduced number of change points. [source]

Simple preconditioners for the conjugate gradient method: experience with test day models

JOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 3 2002
I. STRANDÉN
Preconditioned conjugate gradient method can be used to solve large mixed model equations quickly. Convergence of the method depends on the quality of the preconditioner. Here, the effect of simple preconditioners on the number of iterations until convergence was studied by solving breeding values for several test day models. The test day records were from a field data set, and several simulated data sets with low and high correlations among regression coefficients. The preconditioner matrices had diagonal or block diagonal parts. Transformation of the mixed model equations by diagonalization of the genetic covariance matrix was studied as well. Preconditioner having the whole block of the fixed effects was found to be advantageous. A block diagonal preconditioner for the animal effects reduced the number of iterations the higher the correlations among animal effects, but increased memory usage of the preconditioner. Diagonalization of the animal genetic covariance matrix often reduced the number of iterations considerably without increased memory usage. Einfache Preconditioners für die `Conjugate Gradient Method': Erfahrungen mit Testtagsmodellen Die `Preconditioned Conjugate Gradient Methode' kann benutzt werden um große `Mixed Model' Gleichungssysteme schnell zu lösen. In diesem Beitrag wurde der Einfluss von einfachen Preconditioners auf die Anzahl an Iterationen bis zur Konvergenz bei der Schätzung von Zuchtwerten bei verschiedenen Testtagsmodellen untersucht. Die Testtagsdaten stammen aus einem Felddatensatz und mehreren simulierten Datensätzen mit unterschiedlichen Korrelationen zwischen den Regressionskoeffizienten. Die Preconditioner Matrix bestand aus Diagonalen oder Blockdiagonalen Teilen. Eine Transformation der Mixed Modell Gleichungen durch Diagonalisierung der genetischen Kovarianzmatrix wurde ebenfalls untersucht. Preconditioners mit dem Block der fixen Effekte zeigten sich immer überlegen. Ein Blockdiagonaler Preconditioner für den Tiereffekt reduzierte die Anzahl an Iterationen mit höher werden Korrelationen zwischen den Tiereffekten, aber erhöhte den Speicherbedarf. Eine Diagonalisierung der genetischen Kovarianzmatrix reduzierte sehr oft die Anzahl an Iterationen erheblich ohne den Speicherbedarf zu erhöhen. [source]

Selecting significant factors by the noise addition method in principal component analysis

JOURNAL OF CHEMOMETRICS, Issue 7 2001
Brian K. Dable
Abstract The noise addition method (NAM) is presented as a tool for determining the number of significant factors in a data set. The NAM is compared to residual standard deviation (RSD), the factor indicator function (IND), chi-squared (,2) and cross-validation (CV) for establishing the number of significant factors in three data sets. The comparison and validation of the NAM are performed through Monte Carlo simulations with noise distributions of varying standard deviation, HPLC/UV-vis chromatographs of a mixture of aromatic hydrocarbons, and FIA of methyl orange. The NAM succeeds in correctly identifying the proper number of significant factors 98% of the time with the simulated data, 99% in the HPLC data sets and 98% with the FIA data. RSD and ,2 fail to choose the proper number of factors in all three data sets. IND identifies the correct number of factors in the simulated data sets but fails with the HPLC and FIA data sets. Both CV methods fail in the HPLC and FIA data sets. CV also fails for the simulated data sets, while the modified CV correctly chooses the proper number of factors an average of 80% of the time. Copyright © 2001 John Wiley & Sons, Ltd. [source]

Maximum likelihood estimation in space time bilinear models

JOURNAL OF TIME SERIES ANALYSIS, Issue 1 2003
YUQING DAI
The space time bilinear (STBL) model is a special form of a multiple bilinear time series that can be used to model time series which exhibit bilinear behaviour on a spatial neighbourhood structure. The STBL model and its identification have been proposed and discussed by Dai and Billard (1998). The present work considers the problem of parameter estimation for the STBL model. A conditional maximum likelihood estimation procedure is provided through the use of a Newton,Raphson numerical optimization algorithm. The gradient vector and Hessian matrix are derived together with recursive equations for computation implementation. The methodology is illustrated with two simulated data sets, and one real-life data set. [source]

Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows

MOLECULAR ECOLOGY RESOURCES, Issue 3 2010
LAURENT EXCOFFIER
Abstract We present here a new version of the Arlequin program available under three different forms: a Windows graphical version (Winarl35), a console version of Arlequin (arlecore), and a specific console version to compute summary statistics (arlsumstat). The command-line versions run under both Linux and Windows. The main innovations of the new version include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans. Command-line versions are designed to handle large series of files, and arlsumstat can be used to generate summary statistics from simulated data sets within an Approximate Bayesian Computation framework. [source]

Can the cause of aggregation be inferred from species distributions?

OIKOS, Issue 1 2007
Astrid J.A. Van Teeffelen
Species distributions often show an aggregated pattern, which can be due to a number of endo- and exogenous factors. While autologistic models have been used for modelling such data with statistical rigour, little emphasis has been put on disentangling potential causes of aggregation. In this paper we ask whether it is possible to infer sources of aggregation in species distributions from a single set of occurrence data by comparing the performance of various autologistic models. We create simulated data sets, which show similar occupancy patterns, but differ in the process that causes the aggregation. We model the distribution of these data with various autologistic models, and show how the relative performance of the models is sensitive to the factor causing aggregation in the data. This information can be used when modelling real species data, where causes of aggregation are typically unknown. To illustrate, we use our approach to assess the potential causes of aggregation in data of seven bird species with contrasting statistical patterns. Our findings have important implications for conservation, as understanding the mechanisms that drive population fluctuations in space and time is critical for the development of effective management actions for long-term conservation. [source]