Sampling Process (sampling + process)

Distribution by Scientific Domains


Selected Abstracts


Sensitivity to sampling in Bayesian word learning

DEVELOPMENTAL SCIENCE, Issue 3 2007
Fei Xu
We report a new study testing our proposal that word learning may be best explained as an approximate form of Bayesian inference (Xu & Tenenbaum, in press). Children are capable of learning word meanings across a wide range of communicative contexts. In different contexts, learners may encounter different sampling processes generating the examples of word,object pairings they observe. An ideal Bayesian word learner could take into account these differences in the sampling process and adjust his/her inferences about word meaning accordingly. We tested how children and adults learned words for novel object kinds in two sampling contexts, in which the objects to be labeled were sampled either by a knowledgeable teacher or by the learners themselves. Both adults and children generalized more conservatively in the former context; that is, they restricted the label to just those objects most similar to the labeled examples when the exemplars were chosen by a knowledgeable teacher, but not when chosen by the learners themselves. We discuss how this result follows naturally from a Bayesian analysis, but not from other statistical approaches such as associative word-learning models. [source]


On the estimation of species richness based on the accumulation of previously unrecorded species

ECOGRAPHY, Issue 1 2002
Emmanuelle Cam
Estimation of species richness of local communities has become an important topic in community ecology and monitoring. Investigators can seldom enumerate all the species present in the area of interest during sampling sessions. If the location of interest is sampled repeatedly within a short time period, the number of new species recorded is typically largest in the initial sample and decreases as sampling proceeds, but new species may be detected if sampling sessions are added. The question is how to estimate the total number of species. The data collected by sampling the area of interest repeatedly can be used to build species accumulation curves: the cumulative number of species recorded as a function of the number of sampling sessions (which we refer to as "species accumulation data"). A classic approach used to compute total species richness is to fit curves to the data on species accumulation with sampling effort. This approach does not rest on direct estimation of the probability of detecting species during sampling sessions and has no underlying basis regarding the sampling process that gave rise to the data. Here we recommend a probabilistic, nonparametric estimator for species richness for use with species accumulation data. We use estimators of population size that were developed for capture-recapture data, but that can be used to estimate the size of species assemblages using species accumulation data. Models of detection probability account for the underlying sampling process. They permit variation in detection probability among species. We illustrate this approach using data from the North American Breeding Bird Survey (BBS). We describe other situations where species accumulation data are collected under different designs (e.g., over longer periods of time, or over spatial replicates) and that lend themselves to of use capture-recapture models for estimating the size of the community of interest. We discuss the assumptions and interpretations corresponding to each situation. [source]


Sampling from Dirichlet partitions: estimating the number of species

ENVIRONMETRICS, Issue 7 2009
Thierry Huillet
Abstract The Dirichlet partition of an interval can be viewed as the generalization of several classical models in ecological statistics. We recall the unordered Ewens sampling formulae -ESF) from finite Dirichlet partitions. As this is a key variable for estimation purposes, focus is on the number of distinct visited species in the sampling process. These are illustrated in specific cases. We use these preliminary statistical results on frequencies distribution to address the following sampling problem: what is the estimated number of species when sampling is from Dirichlet populations? The obtained results are in accordance with the ones found in sampling theory from random proportions with Poisson,Dirichlet -PD) distribution. To conclude with, we apply the different estimators suggested to two different sets of real data. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Suggested Methods to Mitigate Bias from Nondissolved Petroleum in Ground Water Samples Collected from the Smear Zone

GROUND WATER MONITORING & REMEDIATION, Issue 3 2009
Dawn A. Zemo
This article provides actual site data that confirm that turbid ground water samples collected from within the smear zone at petroleum release sites can be significantly biased high by the inclusion of a nondissolved component that is an artifact of the sampling process. Side-by-side comparisons show that reducing sample turbidity can result in significant reductions of reported concentrations for the ground water samples and that the lower turbidity results are more representative of the petroleum actually dissolved in the ground water. Depending on site-specific factors, ground water sample turbidity can be reduced by four field-based and two laboratory-based methods. These methods should be used routinely at sites where turbid samples with a nondissolved component are being collected. [source]


Bayesian inference in a piecewise Weibull proportional hazards model with unknown change points

JOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 4 2007
J. Casellas
Summary The main difference between parametric and non-parametric survival analyses relies on model flexibility. Parametric models have been suggested as preferable because of their lower programming needs although they generally suffer from a reduced flexibility to fit field data. In this sense, parametric survival functions can be redefined as piecewise survival functions whose slopes change at given points. It substantially increases the flexibility of the parametric survival model. Unfortunately, we lack accurate methods to establish a required number of change points and their position within the time space. In this study, a Weibull survival model with a piecewise baseline hazard function was developed, with change points included as unknown parameters in the model. Concretely, a Weibull log-normal animal frailty model was assumed, and it was solved with a Bayesian approach. The required fully conditional posterior distributions were derived. During the sampling process, all the parameters in the model were updated using a Metropolis,Hastings step, with the exception of the genetic variance that was updated with a standard Gibbs sampler. This methodology was tested with simulated data sets, each one analysed through several models with different number of change points. The models were compared with the Deviance Information Criterion, with appealing results. Simulation results showed that the estimated marginal posterior distributions covered well and placed high density to the true parameter values used in the simulation data. Moreover, results showed that the piecewise baseline hazard function could appropriately fit survival data, as well as other smooth distributions, with a reduced number of change points. [source]


Deriving target selection rules from endogenously selected samples

JOURNAL OF APPLIED ECONOMETRICS, Issue 5 2006
Bas Donkers
The selection of the most profitable customers in a customer database for targeted activities is often done based on observed behaviour in the past. Consequently, databases arising from the responses to, for example, direct mailings in the past are not random samples. When not all heterogeneity across customers is observed, target selection will be based on unobserved heterogeneity and hence it is endogenous. We develop a method to adjust the likelihood function of latent class models to correct for this endogenous sampling process. We apply this technique to the selection of mail targets for a Dutch charity. Based on a joint model for the response rate and the amount donated, we create a target selection rule that maximizes expected revenues. Copyright © 2006 John Wiley & Sons, Ltd. [source]


An algorithm for the uniform sampling of iso-energy surfaces and for the calculation of microcanonical averages

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 4 2006
Arnaldo RapalloArticle first published online: 17 JAN 200
Abstract In this article an algorithm is proposed to efficiently perform the uniform sampling of an iso-energy surface corresponding to a fixed potential energy U of a molecular system, and for calculating averages of certain quantities over microstates having this energy (microcanonical averages). The developed sampling technique is based upon the combination of a recently proposed method for performing constant potential energy molecular dynamics simulations [Rapallo, A. J Chem Phys 2004, 121, 4033] with well-established thermostatting techniques used in the framework of standard molecular dynamics simulations, such as the Andersen thermostat, and the Nose,Hoover chain thermostat. The proposed strategy leads to very accurate and drift-free potential energy conservation during the whole sampling process, and, very important, specially when dealing with high-dimensional or complicated potential functions, it does not require the calculation of the potential energy function hessian. The technique proved to be very reliable for sampling both low- and high-dimensional surfaces. © 2006 Wiley Periodicals, Inc. J Comput Chem 27: 414,425, 2006 [source]


Discrepancies between the phenotypic and genotypic characterization of Lactococcus lactis cheese isolates

LETTERS IN APPLIED MICROBIOLOGY, Issue 6 2006
M. De La Plaza
Abstract Aims:, The use of randomly amplified polymorphic DNA (RAPD)-PCR fingerprinting and plasmid profiles to determine at the strain level, the similarity of Lactococcus lactis isolates obtained during sampling of traditional cheeses and to verify its correspondence to the selected phenotypic characteristics. Methods and Results:, A total of 45 L. lactis isolates were genotypically analysed by RAPD-PCR fingerprinting and plasmid patterns. Phenotypic traits used to compare strains were proteolytic, acidifying, aminotransferase (aromatic and branched chain aminotransferase) and , -ketoisovalerate decarboxylase (Kivd) activities. The results show that 23 isolates could be grouped in clusters that exhibited 100% identity in both their RAPD and plasmid patterns, indicating the probable isolation of dominant strains during the cheese sampling process. However, there were phenotypic differences between isolates within the same cluster that included the loss of relevant technological properties such as proteinase activity and acidifying capacity or high variation in their amino acid converting enzyme activities. Likewise, the analysis of a specific attribute, Kivd activity, indicated that 7 of 15 isolates showed no detectable activity despite the presence of the encoding (kivd) gene. Conclusion:, Phenotypic differences found between genotypically similar strains of L. lactis strains could be linked to differences in enzymatic expression. Significance and Impact of the Study:, Phenotypic analysis of L. lactis isolates should be considered when selecting strains with new cheese flavour forming capabilities. [source]


`Making the molecular movie': first frames

ACTA CRYSTALLOGRAPHICA SECTION A, Issue 2 2010
R. J. Dwayne Miller
Recent advances in high-intensity electron and X-ray pulsed sources now make it possible to directly observe atomic motions as they occur in barrier-crossing processes. These rare events require the structural dynamics to be triggered by femtosecond excitation pulses that prepare the system above the barrier or access new potential energy surfaces that drive the structural changes. In general, the sampling process modifies the system such that the structural probes should ideally have sufficient intensity to fully resolve structures near the single-shot limit for a given time point. New developments in both source intensity and temporal characterization of the pulsed sampling mode have made it possible to make so-called `molecular movies', i.e. measure relative atomic motions faster than collisions can blur information on correlations. Strongly driven phase transitions from thermally propagated melting to optically modified potential energy surfaces leading to ballistic phase transitions and bond stiffening are given as examples of the new insights that can be gained from an atomic level perspective of structural dynamics. The most important impact will likely be made in the fields of chemistry and biology where the central unifying concept of the transition state will come under direct observation and enable a reduction of high-dimensional complex reaction surfaces to the key reactive modes, as long mastered by Mother Nature. [source]


Imputation and Variable Selection in Linear Regression Models with Missing Covariates

BIOMETRICS, Issue 2 2005
Xiaowei Yang
Summary Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS. [source]


Factors influencing testicular volume in young men: results of a community-based survey

BJU INTERNATIONAL, Issue 4 2002
J.H. Ku
Objective,To investigate the factors influencing testicular volume in young men in the community. Subjects and methods,Between May and November 2001, 2700 men aged 20 years and dwelling in the community were randomly selected at a 10% sampling fraction after a sampling process by census district; 2080 men agreed to participate in the study. All volunteers underwent a standard evaluation, including a detailed medical history and physical examination. After excluding those with testicular diseases the study comprised 1792 men. Results,There were significant but weak correlations between testicular volumes and height, body weight and body mass index. In a multivariate model, high environmental temperature was associated with a decreased likelihood (odds ratio, OR, 0.42; 95% confidence interval, CI, 0.29,0.60; P < 0.001) of a paired testicular volume being below the 25th percentile of all participants. The likelihood of a low paired testicular volume varied by area, with a 1.6-fold greater risk in men dwelling in large rural areas than in those in major towns. Increasing height was associated with a decreased likelihood (OR 0.60; 95% CI 0.38,0.96; P = 0.032) and low body weight with an increased likelihood of a low paired testicular volume (OR 2.54; 95% CI 1.57,4.12; P < 0.001). Conclusion,These results establish that demographic and environmental factors have an effect on testicular size and suggest that body size may be important in determining testicular size in late adolescents. [source]


Sensitivity to sampling in Bayesian word learning

DEVELOPMENTAL SCIENCE, Issue 3 2007
Fei Xu
We report a new study testing our proposal that word learning may be best explained as an approximate form of Bayesian inference (Xu & Tenenbaum, in press). Children are capable of learning word meanings across a wide range of communicative contexts. In different contexts, learners may encounter different sampling processes generating the examples of word,object pairings they observe. An ideal Bayesian word learner could take into account these differences in the sampling process and adjust his/her inferences about word meaning accordingly. We tested how children and adults learned words for novel object kinds in two sampling contexts, in which the objects to be labeled were sampled either by a knowledgeable teacher or by the learners themselves. Both adults and children generalized more conservatively in the former context; that is, they restricted the label to just those objects most similar to the labeled examples when the exemplars were chosen by a knowledgeable teacher, but not when chosen by the learners themselves. We discuss how this result follows naturally from a Bayesian analysis, but not from other statistical approaches such as associative word-learning models. [source]