Presence-only Data (presence-only + data)

Distribution by Scientific Domains


Selected Abstracts


Presence-Only Data and the EM Algorithm

BIOMETRICS, Issue 2 2009
Gill Ward
Summary In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation,maximization algorithm to estimate the underlying presence,absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence,absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided. [source]


Novel methods improve prediction of species' distributions from occurrence data

ECOGRAPHY, Issue 2 2006
Jane Elith
Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve. [source]


Presence-Only Data and the EM Algorithm

BIOMETRICS, Issue 2 2009
Gill Ward
Summary In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation,maximization algorithm to estimate the underlying presence,absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence,absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided. [source]


Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines

DIVERSITY AND DISTRIBUTIONS, Issue 3 2007
Jane Elith
ABSTRACT Current circumstances , that the majority of species distribution records exist as presence-only data (e.g. from museums and herbaria), and that there is an established need for predictions of species distributions , mean that scientists and conservation managers seek to develop robust methods for using these data. Such methods must, in particular, accommodate the difficulties caused by lack of reliable information about sites where species are absent. Here we test two approaches for overcoming these difficulties, analysing a range of data sets using the technique of multivariate adaptive regression splines (MARS). MARS is closely related to regression techniques such as generalized additive models (GAMs) that are commonly and successfully used in modelling species distributions, but has particular advantages in its analytical speed and the ease of transfer of analysis results to other computational environments such as a Geographic Information System. MARS also has the advantage that it can model multiple responses, meaning that it can combine information from a set of species to determine the dominant environmental drivers of variation in species composition. We use data from 226 species from six regions of the world, and demonstrate the use of MARS for distribution modelling using presence-only data. We test whether (1) the type of data used to represent absence or background and (2) the signal from multiple species affect predictive performance, by evaluating predictions at completely independent sites where genuine presence,absence data were recorded. Models developed with absences inferred from the total set of presence-only sites for a biological group, and using simultaneous analysis of multiple species to inform the choice of predictor variables, performed better than models in which species were analysed singly, or in which pseudo-absences were drawn randomly from the study area. The methods are fast, relatively simple to understand, and useful for situations where data are limited. A tutorial is included. [source]


ModEco: an integrated software package for ecological niche modeling

ECOGRAPHY, Issue 4 2010
Qinghua Guo
ModEco is a software package for ecological niche modeling. It integrates a range of niche modeling methods within a geographical information system. ModEco provides a user friendly platform that enables users to explore, analyze, and model species distribution data with relative ease. ModEco has several unique features: 1) it deals with different types of ecological observation data, such as presence and absence data, presence-only data, and abundance data; 2) it provides a range of models when dealing with presence-only data, such as presence-only models, pseudo-absence models, background vs presence data models, and ensemble models; and 3) it includes relatively comprehensive tools for data visualization, feature selection, and accuracy assessment. [source]


The effect of sample size and species characteristics on performance of different species distribution modeling methods

ECOGRAPHY, Issue 5 2006
Pilar A. Hernandez
Species distribution models should provide conservation practioners with estimates of the spatial distributions of species requiring attention. These species are often rare and have limited known occurrences, posing challenges for creating accurate species distribution models. We tested four modeling methods (Bioclim, Domain, GARP, and Maxent) across 18 species with different levels of ecological specialization using six different sample size treatments and three different evaluation measures. Our assessment revealed that Maxent was the most capable of the four modeling methods in producing useful results with sample sizes as small as 5, 10 and 25 occurrences. The other methods compensated reasonably well (Domain and GARP) to poorly (Bioclim) when presented with datasets of small sample sizes. We show that multiple evaluation measures are necessary to determine accuracy of models produced with presence-only data. Further, we found that accuracy of models is greater for species with small geographic ranges and limited environmental tolerance, ecological characteristics of many rare species. Our results indicate that reasonable models can be made for some rare species, a result that should encourage conservationists to add distribution modeling to their toolbox. [source]


Novel methods improve prediction of species' distributions from occurrence data

ECOGRAPHY, Issue 2 2006
Jane Elith
Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve. [source]


Spatially autocorrelated sampling falsely inflates measures of accuracy for presence-only niche models

JOURNAL OF BIOGEOGRAPHY, Issue 12 2009
Samuel D. Veloz
Abstract Aim, Environmental niche models that utilize presence-only data have been increasingly employed to model species distributions and test ecological and evolutionary predictions. The ideal method for evaluating the accuracy of a niche model is to train a model with one dataset and then test model predictions against an independent dataset. However, a truly independent dataset is often not available, and instead random subsets of the total data are used for ,training' and ,testing' purposes. The goal of this study was to determine how spatially autocorrelated sampling affects measures of niche model accuracy when using subsets of a larger dataset for accuracy evaluation. Location, The distribution of Centaurea maculosa (spotted knapweed; Asteraceae) was modelled in six states in the western United States: California, Oregon, Washington, Idaho, Wyoming and Montana. Methods, Two types of niche modelling algorithms , the genetic algorithm for rule-set prediction (GARP) and maximum entropy modelling (as implemented with Maxent) , were used to model the potential distribution of C. maculosa across the region. The effect of spatially autocorrelated sampling was examined by applying a spatial filter to the presence-only data (to reduce autocorrelation) and then comparing predictions made using the spatial filter with those using a random subset of the data, equal in sample size to the filtered data. Results, The accuracy of predictions from both algorithms was sensitive to the spatial autocorrelation of sampling effort in the occurrence data. Spatial filtering led to lower values of the area under the receiver operating characteristic curve plot but higher similarity statistic (I) values when compared with predictions from models built with random subsets of the total data, meaning that spatial autocorrelation of sampling effort between training and test data led to inflated measures of accuracy. Main conclusions, The findings indicate that care should be taken when interpreting the results from presence-only niche models when training and test data have been randomly partitioned but occurrence data were non-randomly sampled (in a spatially autocorrelated manner). The higher accuracies obtained without the spatial filter are a result of spatial autocorrelation of sampling effort between training and test data inflating measures of prediction accuracy. If independently surveyed data for testing predictions are unavailable, then it may be necessary to explicitly account for the spatial autocorrelation of sampling effort between randomly partitioned training and test subsets when evaluating niche model predictions. [source]


Modelling the distribution of a threatened habitat: the California sage scrub

JOURNAL OF BIOGEOGRAPHY, Issue 11 2009
Erin C. Riordan
Abstract Aim, Using predictive species distribution and ecological niche modelling our objectives are: (1) to identify important climatic drivers of distribution at regional scales of a locally complex and dynamic system , California sage scrub; (2) to map suitable sage scrub habitat in California; and (3) to distinguish between bioclimatic niches of floristic groups within sage scrub to assess the conservation significance of analysing such species groups. Location, Coastal mediterranean-type shrublands of southern and central California. Methods, Using point localities from georeferenced herbarium records, we modelled the potential distribution and bioclimatic envelopes of 14 characteristic sage scrub species and three floristic groups (south-coastal, coastal,interior disjunct and broadly distributed species) based upon current climate conditions. Maxent was used to map climatically suitable habitat, while principal components analysis followed by canonical discriminant analysis were used to distinguish between floristic groups and visualize species and group distributions in multivariate ecological space. Results, Geographical distribution patterns of individual species were mirrored in the habitat suitability maps of floristic groups, notably the disjunct distribution of the coastal,interior species. Overlap in the distributions of floristic groups was evident in both geographical and multivariate niche space; however, discriminant analysis confirmed the separability of floristic groups based on bioclimatic variables. Higher performance of floristic group models compared with sage scrub as a whole suggests that groups have differing climate requirements for habitat suitability at regional scales and that breaking sage scrub into floristic groups improves the discrimination between climatically suitable and unsuitable habitat. Main conclusions, The finding that presence-only data and climatic variables can produce useful information on habitat suitability of California sage scrub species and floristic groups at a regional scale has important implications for ongoing efforts of habitat restoration for sage scrub. In addition, modelling at a group level provides important information about the differences in climatic niches within California sage scrub. Finally, the high performance of our floristic group models highlights the potential a community-level modelling approach holds for investigating plant distribution patterns. [source]


Presence-Only Data and the EM Algorithm

BIOMETRICS, Issue 2 2009
Gill Ward
Summary In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation,maximization algorithm to estimate the underlying presence,absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence,absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided. [source]