Home About us Contact | |||
Sampling Scheme (sampling + scheme)
Kinds of Sampling Scheme Selected AbstractsAn Adaptive Sampling Scheme for Out-of-Core SimplificationCOMPUTER GRAPHICS FORUM, Issue 2 2002Guangzheng Fei Current out-of-core simplification algorithms can efficiently simplify large models that are too complex to be loaded in to the main memory at one time. However, these algorithms do not preserve surface details well since adaptive sampling, a typical strategy for detail preservation, remains to be an open issue for out-of-core simplification. In this paper, we present an adaptive sampling scheme, called the balanced retriangulation (BR), for out-of-core simplification. A key idea behind BR is that we can use Garland's quadric error matrix to analyze the global distribution of surface details. Based on this analysis, a local retriangulation achieves adaptive sampling by restoring detailed areas with cell split operations while further simplifying smooth areas with edge collapse operations. For a given triangle budget, BR preserves surface details significantly better than uniform sampling algorithms such as uniform clustering. Like uniform clustering, our algorithm has linear running time and small memory requirement. [source] The behaviour of soil process models of ammonia volatilization at contrasting spatial scalesEUROPEAN JOURNAL OF SOIL SCIENCE, Issue 6 2008R. Corstanje Summary Process models are commonly used in soil science to obtain predictions at a spatial scale that is different from the scale at which the model was developed, or the scale at which information on model inputs is available. When this happens, the model and its inputs require aggregation or disaggregation to the application scale, and this is a complex problem. Furthermore, the validity of the aggregated model predictions depends on whether the model describes the key processes that determine the process outcome at the target scale. Different models may therefore be required at different spatial scales. In this paper we develop a diagnostic framework which allows us to judge whether a model is appropriate for use at one or more spatial scales both with respect to the prediction of variations at those scale and in the requirement for disaggregation of the inputs. We show that spatially nested analysis of the covariance of predictions with measured process outcomes is an efficient way to do this. This is applied to models of the processes that lead to ammonia volatilization from soil after the application of urea. We identify the component correlations at different scales of a nested scheme as the diagnostic with which to evaluate model behaviour. These correlations show how well the model emulates components of spatial variation of the target process at the scales of the sampling scheme. Aggregate correlations were identified as the most pertinent to evaluate models for prediction at particular scales since they measure how well aggregated predictions at some scale correlate with aggregated values of the measured outcome. There are two circumstances under which models are used to make predictions. In the first case only the model is used to predict, and the most useful diagnostic is the concordance aggregate correlation. In the second case model predictions are assimilated with observations which should correct bias in the prediction, and errors in the variance; the aggregate correlations would be the most suitable diagnostic. [source] Habitat associations of Atlantic herring in the Shetland area: influence of spatial scale and geographic segmentationFISHERIES OCEANOGRAPHY, Issue 3 2001CHRISTOS D. Maravelias This study considers the habitat associations of a pelagic species with a range of biotic and abiotic factors at three different spatial scales. Generalized additive models (GAM) are used to analyse trends in the distributional abundance of Atlantic herring (Clupea harengus) in relation to thermocline and water depth, seabed roughness and hardness, sea surface salinity and temperature, zooplankton abundance and spatial location. Two geographical segments of the population, those east and west of the Shetland Islands (northern North Sea, ICES Div IVa), are examined. The differences in the ecological preferences of the species in these two distinct geographical areas are elucidated and the degree that these environmental relationships might be modulated by the change of support of the data is also considered. Part of the observed variability of the pre-spawning distribution of herring was explained by different parameters in these two regions. Notwithstanding this, key determinants of the species' spatial aggregation in both areas were zooplankton abundance and the nature of the seabed substrate. The relative importance of the variables examined did not change significantly at different spatial scales of the observation window. The diverse significance of various environmental factors on herring distribution was attributed mainly to the interaction of species' dynamics with the different characteristics of the ecosystem, east and west of the Shetland Islands. Results suggest that the current 2.5 nautical miles as elementary sampling distance unit (ESDU) is a reasonable sampling scheme that combines the need to reduce the data volume while maintaining spatial resolution to distinguish the species/environment relationships. [source] Unified sampling approach for multipoint linkage disequilibrium mapping of qualitative and quantitative traitsGENETIC EPIDEMIOLOGY, Issue 4 2002Fang-Chi Hsu Abstract Rapid development in biotechnology has enhanced the opportunity to deal with multipoint gene mapping for complex diseases, and association studies using quantitative traits have recently generated much attention. Unlike the conventional hypothesis-testing approach for fine mapping, we propose a unified multipoint method to localize a gene controlling a quantitative trait. We first calculate the sample size needed to detect linkage and linkage disequilibrium (LD) for a quantitative trait, categorized by decile, under three different modes of inheritance. Our results show that sampling trios of offspring and their parents from either extremely low (EL) or extremely high (EH) probands provides greater statistical power than sampling in the intermediate range. We next propose a unified sampling approach for multipoint LD mapping, where the goal is to estimate the map position (,) of a trait locus and to calculate a confidence interval along with its sampling uncertainty. Our method builds upon a model for an expected preferential transmission statistic at an arbitrary locus conditional on the sampling scheme, such as sampling from EL and EH probands. This approach is valid regardless of the underlying genetic model. The one major assumption for this model is that no more than one quantitative trait locus (QTL) is linked to the region being mapped. Finally we illustrate the proposed method using family data on total serum IgE levels collected in multiplex asthmatic families from Barbados. An unobserved QTL appears to be located at ,, = 41.93 cM with 95% confidence interval of (40.84, 43.02) through the 20-cM region framed by markers D12S1052 and D12S1064 on chromosome 12. The test statistic shows strong evidence of linkage and LD (chi-square statistic = 18.39 with 2 df, P -value = 0.0001). Genet. Epidemiol. 22:298,312, 2002. © 2002 Wiley-Liss, Inc. [source] Estimating the Variability of Active-Layer Thaw Depth in Two Physiographic Regions of Northern AlaskaGEOGRAPHICAL ANALYSIS, Issue 2 2001Claire E. Gomersall The active layer is the zone above permafrost that experiences seasonal freeze and thaw. Active-layer thickness varies annually in response to air and surface temperature, and generally decreases poleward. Substantially less is known about thaw variability across small lateral distances in response to topography, parent material, vegetation, and subsurface hydrology. A graduated steel rod was used to measure the 1998 end-of-season thaw depth across several transects. A balanced hierarchical sampling design was used to estimate the contribution to total variance in active-layer depth at separating distances of 1, 3, 9, 27, and 100 meters. A second sampling scheme was used to examine variation at shorter distances of 0.3 and 0.1 meter. This seven-stage sample design was applied to two sites in the Arctic Foothills physiographic province, and four sites on the Arctic Coastal Plain province in northern Alaska. The spatial variability for each site was determined using ANOVA and variogram methods to compare intersite and inter-province variation. Spatial variation in thaw depth was different in the Foothills and Coastal Plain sites. A greater percentage of the total variance occurs at short lag distances (0,3 meters) at the Foothills sites, presumably reflecting the influence of frost boils and tussock vegetation on ground heat flow. In contrast, thaw variation at the Coastal Plain sites occurs at distances exceeding 10 meters, and is attributed to the influence of well-developed networks of ice-wedge polygons and the presence of drained thaw-lake basins. This information was used to determine an ongoing sampling scheme for each site and to assess the suitability of each method of analysis. [source] Accuracy and precision of different sampling strategies and flux integration methods for runoff water: comparisons based on measurements of the electrical conductivityHYDROLOGICAL PROCESSES, Issue 2 2006Patrick Schleppi Abstract Because of their fast response to hydrological events, small catchments show strong quantitative and qualitative variations in their water runoff. Fluxes of solutes or suspended material can be estimated from water samples only if an appropriate sampling scheme is used. We used continuous in-stream measurements of the electrical conductivity of the runoff in a small subalpine catchment (64 ha) in central Switzerland and in a very small (0·16 ha) subcatchment. Different sampling and flux integration methods were simulated for weekly water analyses. Fluxes calculated directly from grab samples are strongly biased towards high conductivities observed at low discharges. Several regressions and weighted averages have been proposed to correct for this bias. Their accuracy and precision are better, but none of these integration methods gives a consistently low bias and a low residual error. Different methods of peak sampling were also tested. Like regressions, they produce important residual errors and their bias is variable. This variability (both between methods and between catchments) does not allow one to tell a priori which sampling scheme and integration method would be more accurate. Only discharge-proportional sampling methods were found to give essentially unbiased flux estimates. Programmed samplers with a fraction collector allow for a proportional pooling and are appropriate for short-term studies. For long-term monitoring or experiments, sampling at a frequency proportional to the discharge appears to be the best way to obtain accurate and precise flux estimates. Copyright © 2006 John Wiley & Sons, Ltd. [source] Statistical characterization of the spatial variability of soil moisture in a cutover peatlandHYDROLOGICAL PROCESSES, Issue 1 2004Richard M. Petrone Abstract Soil moisture is a significant variable in its importance to the validation of hydrological models, but it is also the one defining variable that ties in all components of the surface energy balance and as such is of major importance to climate models and their surface schemes. Changing the scale of representation (e.g. from the observation to modelling scale) can further complicate the description of the spatial variability in any hydrological system. We examine this issue using soil moisture and vegetation cover data collected at two contrasting spatial scales and at three different times in the snow-free season from a cutover peat bog in Cacouna, Québec. Soil moisture was measured using Time Domain Reflectometry (TDR) over 90 000 m2 and 1200 m2 grids, at intervals of 30 and 2 m respectively. Analyses of statistical structure, variance and spatial autocorrelation were conducted on the soil moisture data at different sampling resolutions and over different grid sizes to determine the optimal spatial scale and sampling density at which these data should be represented. Increasing the scale of interest without adequate resolution in the measurement can lead to significant inconsistency in the representation of these variables. Furthermore, a lack of understanding of the nature of the variability of soil moisture at different scales may produce spurious representation in a modelling context. The analysis suggests that in terms of the distribution of soil moisture, the extent of sampling within a grid is not as significant as the density, or spacing, of the measurements. Both the scale and resolution of the sampling scheme have an impact on the mean of the distribution. Only approximately 60% of the spatial pattern in soil moisture of both the large and small grid is persistent over time, suggesting that the pattern of moisture differs for wetting and drying cycles. Copyright © 2003 John Wiley & Sons, Ltd. [source] Selective sampling for approximate clustering of very large data setsINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 3 2008Liang Wang A key challenge in pattern recognition is how to scale the computational efficiency of clustering algorithms on large data sets. The extension of non-Euclidean relational fuzzy c-means (NERF) clustering to very large (VL = unloadable) relational data is called the extended NERF (eNERF) clustering algorithm, which comprises four phases: (i) finding distinguished features that monitor progressive sampling; (ii) progressively sampling from a N × N relational matrix RN to obtain a n × n sample matrix Rn; (iii) clustering Rn with literal NERF; and (iv) extending the clusters in Rn to the remainder of the relational data. Previously published examples on several fairly small data sets suggest that eNERF is feasible for truly large data sets. However, it seems that phases (i) and (ii), i.e., finding Rn, are not very practical because the sample size n often turns out to be roughly 50% of n, and this over-sampling defeats the whole purpose of eNERF. In this paper, we examine the performance of the sampling scheme of eNERF with respect to different parameters. We propose a modified sampling scheme for use with eNERF that combines simple random sampling with (parts of) the sampling procedures used by eNERF and a related algorithm sVAT (scalable visual assessment of clustering tendency). We demonstrate that our modified sampling scheme can eliminate over-sampling of the original progressive sampling scheme, thus enabling the processing of truly VL data. Numerical experiments on a distance matrix of a set of 3,000,000 vectors drawn from a mixture of 5 bivariate normal distributions demonstrate the feasibility and effectiveness of the proposed sampling method. We also find that actually running eNERF on a data set of this size is very costly in terms of computation time. Thus, our results demonstrate that further modification of eNERF, especially the extension stage, will be needed before it is truly practical for VL data. © 2008 Wiley Periodicals, Inc. [source] Study Design for Assessing Species Environment Relationships and Developing Indicator Systems for Ecological Changesin Floodplains , The Approach of the RIVA ProjectINTERNATIONAL REVIEW OF HYDROBIOLOGY, Issue 4 2006Klaus Henle Abstract In this article the study design and data sampling of the RIVA project , "Development and Testing of a Robust Indicator System for Ecological Changes in Floodplain Systems" , are described. The project was set up to improve existing approaches to study species environment relationships as a basis for the development of indicator systems and predictive models. Periodically flooded grassland was used as a model system. It is agriculturally used at a level of intermediate intensity and is the major habitat type along the Middle Elbe, Germany. We chose a main study area to analyse species environment relationships and two reference sites for testing the transferability of the results. Using a stratified random sampling scheme, we distributed 36 study plots across the main study site and 12 plots each within the reference sites. In each of the study plots, hydrological and soil variables were measured and plants, molluscs, and carabid beetles were sampled. Hoverflies were collected on a subset of the sampling plots. A brief summary of first results is then provided. (© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source] Estimating Long-term Trends in Tropospheric Ozone LevelsINTERNATIONAL STATISTICAL REVIEW, Issue 1 2002Michael Smith Summary This paper develops Bayesian methodology for estimating long-term trends in the daily maxima of tropospheric ozone. The methods are then applied to study long-term trends in ozone at six monitoring sites in the state of Texas. The methodology controls for the effects of meteorological variables because it is known that variables such as temperature, wind speed and humidity substantially affect the formation of tropospheric ozone. A semiparametric regression model is estimated in which a nonparametric trivariate surface is used to model the relationship between ozone and these meteorological variables because, while it is known that the relatinship is a complex nonlinear one, its functional form is unknown. The model also allows for the effects of wind direction and seasonality. The errors are modeled as an autoregression, which is methodologically challenging because the observations are unequally spaced over time. Each function in the model is represented as a linear combination of basis functions located at all of the design points. We also estimate an appropriate data transformation simulataneously with the functions. The functions are estimated nonparametrically by a Bayesian hierarchical model that uses indicator variables to allow a non-zero probability that the coefficient of each basis term is zero. The entire model, including the nonparametric surfaces, data transformation and autoregression for the unequally spaced errors, is estimated using a Markov chain Monte Carlo sampling scheme with a computationally efficient transition kernel for generating the indicator variables. The empirical results indicate that key meteorological variables explain most of the variation in daily ozone maxima through a nonlinear interaction and that their effects are consistent across the six sites. However, the estimated trends vary considerably from site to site, even within the same city. [source] Epistatic kinship a new measure of genetic diversity for short-term phylogenetic structures , theoretical investigationsJOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 3 2006C. Flury Summary The epistatic kinship describes the probability that chromosomal segments of length x in Morgan are identical by descent. It is an extension from the single locus consideration of the kinship coefficient to chromosomal segments. The parameter reflects the number of meioses separating individuals or populations. Hence it is suggested as a measure to quantify the genetic distance of subpopulations that have been separated only few generations ago. Algorithms for the epistatic kinship and the extension of the rules to set up the rectangular relationship matrix are presented. The properties of the epistatic kinship based on pedigree information were investigated theoretically. Pedigree data are often missing for small livestock populations. Therefore, an approach to estimate epistatic kinship based on molecular marker data are suggested. For the epistatic kinship based on marker information haplotypes are relevant. An easy and fast method that derives haplotypes and the respective frequencies without pedigree information was derived based on sampled full-sib pairs. Different parameters of the sampling scheme were tested in a simulation study. The power of the method decreases with increasing segment length and with increasing number of segments genotyped. Further, it is shown that the efficiency of the approach is influenced by the number of animals genotyped and the polymorphism of the markers. It is discussed that the suggested method has a considerable potential to allow a phylogenetic differentiation between close populations, where small sample size can be balanced by the number, the length, and the degree of polymorphism of the chromosome segments considered. [source] The terminology and use of species,area relationships: a response to Dengler (2009)JOURNAL OF BIOGEOGRAPHY, Issue 10 2009Samuel M. Scheiner Abstract Dengler (Journal of Biogeography, 2009, 36, 728,744) addresses issues regarding species,area relationships (SARs), but fails to settles those issues. He states that only certain types of sampling schemes should be used to construct SARs, but is not consistent in the criteria that he uses to include some sampling schemes but not others. He argues that a sampling scheme of contiguous plots will be more accurate in extrapolating beyond the sampled area, but logic tells us that a dispersed sampling scheme is likely to be more accurate. Finally, he concludes that the ,true' SAR is a power function, but this conclusion is inconsistent with his results and with the results of others. Rather than defining a narrow framework for SARs, we need to recognize that the relationship between area and species richness is scale- and system-dependent. Different sampling schemes serve different purposes, and a variety of functional relationships are likely to hold. Further theoretical and empirical work is needed to resolve these issues fully. [source] The inselberg flora of Atlantic Central Africa.JOURNAL OF BIOGEOGRAPHY, Issue 4 2005Abstract Aims, To identify the relative contributions of environmental determinism, dispersal limitation and historical factors in the spatial structure of the floristic data of inselbergs at the local and regional scales, and to test if the extent of species spatial aggregation is related to dispersal abilities. Location, Rain forest inselbergs of Equatorial Guinea, northern Gabon and southern Cameroon (western central Africa). Methods, We use phytosociological relevés and herbarium collections obtained from 27 inselbergs using a stratified sampling scheme considering six plant formations. Data analysis focused on Rubiaceae, Orchidaceae, Melastomataceae, Poaceae, Commelinaceae, Acanthaceae, Begoniaceae and Pteridophytes. Data were investigated using ordination methods (detrended correspondence analysis, DCA; canonical correspondence analysis, CCA), Sørensen's coefficient of similarity and spatial autocorrelation statistics. Comparisons were made at the local and regional scales using ordinations of life-form spectra and ordinations of species data. Results, At the local scale, the forest-inselberg ecotone is the main gradient structuring the floristic data. At the regional scale, this is still the main gradient in the ordination of life-form spectra, but other factors become predominant in analyses of species assemblages. CCA identified three environmental variables explaining a significant part of the variation in floristic data. Spatial autocorrelation analyses showed that both the flora and the environmental factors are spatially autocorrelated: the similarity of species compositions within plant formations decreasing approximately linearly with the logarithm of the spatial distance. The extent of species distribution was correlated with their a priori dispersal abilities as assessed by their diaspore types. Main conclusions, At a local scale, species composition is best explained by a continuous cline of edaphic conditions along the forest-inselberg ecotone, generating a wide array of ecological niches. At a regional scale, these ecological niches are occupied by different species depending on the available local species pool. These subregional species pools probably result from varying environmental conditions, dispersal limitation and the history of past vegetation changes due to climatic fluctuations. [source] Structure of Anogeissus leiocarpa Guill., Perr. natural stands in relation to anthropogenic pressure within Wari-Maro Forest Reserve in BeninAFRICAN JOURNAL OF ECOLOGY, Issue 3 2010Achille Ephrem Assogbadjo Abstract The present study focused on the analysis of the structure of the Anogeissus leiocarpa dominated natural stands in the Wari-Maro forest reserve which are under high and minimal anthropogenic pressures. These stands were considered for forest inventories after carrying out a random sampling scheme of 40 sample units of 30 m × 50 m. In each level pressure stand, the dbh and tree-height of identified tree-species were measured in each plot. Data analyses were based on the computation of structural parameters, establishment of diameter and height distributions and the floristic composition of the two types of stands. Results obtained showed higher values for the overall basal area (9.78 m2 ha,1), mean height (22.37 m) and mean diameter (36.92 cm) for A. leiocarpa in low-pressure stands. In high-pressure stands, some species like Afzelia africana had lower Importance Value Index and the frequency of A. leiocarpa trees in the successive diameter classes dropped rapidly and the value of the logarithmic slope of the height,diameter relationship was lower (9.77) indicating a lanky shape. Results obtained suggest that effective conservation is needed for A. leiocarpa stands under high pressure by limiting human interference and developing appropriate strategy for restoration purposes. Résumé Cette étude s'est focalisée sur l'analyse de la structure de peuplements naturels à dominance de Anogeissus leiocarpa, dans la forêt classée de Wari-Maro, qui subissent à certains endroits, des pressions anthropiques fortes et à d'autres endroits des pressions anthropiques minimes. Ces peuplements ont été inventoriés en considérant un échantillonnage aléatoire de 40 placeaux de 30 m × 50 m. Pour chaque niveau de pression, on a mesuré dans chaque placeau le diamètre à 1,3 m et la hauteur totale des arbres d'espèces identifiées. L'analyse des données s'est basée sur le calcul des paramètres structuraux, sur l'établissement de la distribution en diamètre et en hauteur et sur la composition floristique des peuplements des deux types de formation. Les résultats obtenus indiquent les plus grandes valeurs pour la surface terrière globale (9,78 m² ha,1), la hauteur moyenne (22,37 m) et le diamètre moyen (36,92 cm) chez A. leiocarpa dans les peuplements soumis à une faible pression. Dans les peuplements subissant une forte pression, certaines espèces comme Afzelia africana avaient les plus faibles Indices d'importance, la fréquence de A. leiocarpa dans les classes de hauteurs successives diminuait rapidement et la valeur de la pente logarithmique de la relation hauteur/diamètre était plus faible (9,77), ce qui indique une forme élancée. Les résultats obtenus suggèrent que les peuplements de A. leiocarpa sous forte pressions anthropiques requièrent une conservation efficace, en limitant les pertubations humaines et en développant une stratégie appropriée en vue de leur restauration. [source] Long-memory dynamic Tobit modelsJOURNAL OF FORECASTING, Issue 5 2006A. E. Brockwell Abstract We introduce a long-memory dynamic Tobit model, defining it as a censored version of a fractionally integrated Gaussian ARMA model, which may include seasonal components and/or additional regression variables. Parameter estimation for such a model using standard techniques is typically infeasible, since the model is not Markovian, cannot be expressed in a finite-dimensional state-space form, and includes censored observations. Furthermore, the long-memory property renders a standard Gibbs sampling scheme impractical. Therefore we introduce a new Markov chain Monte Carlo sampling scheme, which is orders of magnitude more efficient than the standard Gibbs sampler. The method is inherently capable of handling missing observations. In case studies, the model is fit to two time series: one consisting of volumes of requests to a hard disk over time, and the other consisting of hourly rainfall measurements in Edinburgh over a 2-year period. The resulting posterior distributions for the fractional differencing parameter demonstrate, for these two time series, the importance of the long-memory structure in the models.,,Copyright © 2006 John Wiley & Sons, Ltd. [source] 1H spectroscopic imaging of human brain at 3 Tesla: Comparison of fast three-dimensional magnetic resonance spectroscopic imaging techniquesJOURNAL OF MAGNETIC RESONANCE IMAGING, Issue 3 2009Matthew L. Zierhut PhD Abstract Purpose To investigate the signal-to-noise-ratio (SNR) and data quality of time-reduced three-dimensional (3D) proton magnetic resonance spectroscopic imaging (1H MRSI) techniques in the human brain at 3 Tesla. Materials and Methods Techniques that were investigated included ellipsoidal k -space sampling, parallel imaging, and echo-planar spectroscopic imaging (EPSI). The SNR values for N-acetyl aspartate, choline, creatine, and lactate or lipid peaks were compared after correcting for effective spatial resolution and acquisition time in a phantom and in the brains of human volunteers. Other factors considered were linewidths, metabolite ratios, partial volume effects, and subcutaneous lipid contamination. Results In volunteers, the median normalized SNR for parallel imaging data decreased by 34,42%, but could be significantly improved using regularization. The normalized signal to noise loss in flyback EPSI data was 11,18%. The effective spatial resolutions of the traditional, ellipsoidal, sensitivity encoding (SENSE) sampling scheme, and EPSI data were 1.02, 2.43, 1.03, and 1.01 cm3, respectively. As expected, lipid contamination was variable between subjects but was highest for the SENSE data. Patient data obtained using the flyback EPSI method were of excellent quality. Conclusion Data from all 1H 3D-MRSI techniques were qualitatively acceptable, based upon SNR, linewidths, and metabolite ratios. The larger field of view obtained with the EPSI methods showed negligible lipid aliasing with acceptable SNR values in less than 9.5 min without compromising the point-spread function. J. Magn. Reson. Imaging 2009;30:473,480. © 2009 Wiley-Liss, Inc. [source] Optimal acquisition orders of diffusion-weighted MRI measurementsJOURNAL OF MAGNETIC RESONANCE IMAGING, Issue 5 2007Philip A. Cook PhD Abstract Purpose To propose a new method to optimize the ordering of gradient directions in diffusion-weighted MRI so that partial scans have the best spherical coverage. Materials and Methods Diffusion-weighted MRI often uses a spherical sampling scheme, which acquires images sequentially with diffusion-weighting gradients in unique directions distributed isotropically on the hemisphere. If not all of the measurements can be completed, the quality of diffusion tensors fitted to the partial scan is sensitive to the order of the gradient directions in the scanner protocol. If the directions are in a random order, then a partial scan may cover some parts of the hemisphere densely but other parts sparsely and thus provide poor spherical coverage. We compare the results of ordering with previously published methods for optimizing the acquisition in simulation. Results Results show that all methods produce similar results and all improve the accuracy of the estimated diffusion tensors significantly over unordered acquisitions. Conclusion The new ordering method improves the spherical coverage of partial scans and has the advantage of maintaining the optimal coverage of the complete scan. J. Magn. Reson. Imaging 2007;25:1051,1058. © 2007 Wiley-Liss, Inc. [source] Sampling bias and logistic modelsJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2008Peter McCullagh Summary., In a regression model, the joint distribution for each finite sample of units is determined by a function px(y) depending only on the list of covariate values x=(x(u1),,,x(un)) on the sampled units. No random sampling of units is involved. In biological work, random sampling is frequently unavoidable, in which case the joint distribution p(y,x) depends on the sampling scheme. Regression models can be used for the study of dependence provided that the conditional distribution p(y|x) for random samples agrees with px(y) as determined by the regression model for a fixed sample having a non-random configuration x. The paper develops a model that avoids the concept of a fixed population of units, thereby forcing the sampling plan to be incorporated into the sampling distribution. For a quota sample having a predetermined covariate configuration x, the sampling distribution agrees with the standard logistic regression model with correlated components. For most natural sampling plans such as sequential or simple random sampling, the conditional distribution p(y|x) is not the same as the regression distribution unless px(y) has independent components. In this sense, most natural sampling schemes involving binary random-effects models are biased. The implications of this formulation for subject-specific and population-averaged procedures are explored. [source] A latent Gaussian model for compositional data with zerosJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 5 2008Adam Butler Summary., Compositional data record the relative proportions of different components within a mixture and arise frequently in many fields. Standard statistical techniques for the analysis of such data assume the absence of proportions which are genuinely zero. However, real data can contain a substantial number of zero values. We present a latent Gaussian model for the analysis of compositional data which contain zero values, which is based on assuming that the data arise from a (deterministic) Euclidean projection of a multivariate Gaussian random variable onto the unit simplex. We propose an iterative algorithm to simulate values from this model and apply the model to data on the proportions of fat, protein and carbohydrate in different groups of food products. Finally, evaluation of the likelihood involves the calculation of difficult integrals if the number of components is more than 3, so we present a hybrid Gibbs rejection sampling scheme that can be used to draw inferences about the parameters of the model when the number of components is arbitrarily large. [source] Power and sample size for nested analysis of molecular varianceMOLECULAR ECOLOGY, Issue 19 2009BENJAMIN M. FITZPATRICK Abstract Analysis of molecular variance (amova) is a widely used tool for quantifying the contribution of various levels of population structure to patterns of genetic variation. Implementations of amova use permutation tests to evaluate null hypotheses of no population structure within groups and between groups. With few populations per group, between-group structure might be impossible to detect because only a few permutations of the sampled populations are possible. In fact, with fewer than six total populations, permutation tests will never result in P -values <0.05 for higher-level population structure. I present minimum numbers of replicates calculated from multinomial coefficients and an r script that can be used to evaluate the minimum P -value for any sampling scheme. While it might seem counterintuitive that a large sample of individuals is uninformative about hierarchical structure, the power to detect between-group differences depends on the number of populations per group and investigators should sample appropriately. [source] Urban-Rural Differences in Motivation to Control Prejudice Toward People With HIV/AIDS: The Impact of Perceived Identifiability in the CommunityTHE JOURNAL OF RURAL HEALTH, Issue 3 2008Janice Yanushka Bunn PhD ABSTRACT:,Context:HIV/AIDS is occurring with increasing frequency in rural areas of the United States, and people living with HIV/AIDS in rural communities report higher levels of perceived stigma than their more urban counterparts. The extent to which stigmatized individuals perceive stigma could be influenced, in part, by prevailing community attitudes. Differences between rural and more metropolitan community members' attitudes toward people with HIV/AIDS, however, have rarely been examined. Purpose: This study investigated motivation to control prejudice toward people with HIV/AIDS among non-infected residents of metropolitan, micropolitan, and rural areas of rural New England. Methods: A total of 2,444 individuals were identified through a random digit dialing sampling scheme, and completed a telephone interview to determine attitudes and concerns about a variety of health issues. Internal or external motivation to control prejudice was examined using a general linear mixed model approach, with independent variables including age, gender, community size, and perceived indentifiability within one's community. Findings: Results showed that community size, by itself, was not related to motivation to control prejudice. However, there was a significant interaction between community size and community residents' perceptions about the extent to which people in their communities know who they are. Conclusion: Our results indicate that residents of rural areas, in general, may not show a higher level of bias toward people with HIV/AIDS. The interaction between community size and perceived identifiability, however, suggests that motivation to control prejudice, and potentially the subsequent expression of that prejudice, is more complex than originally thought. [source] Adaptive thinning of atmospheric observations in data assimilation with vector quantization and filtering methodsTHE QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, Issue 613 2005T. Ochotta Abstract In data assimilation for numerical weather prediction, measurements of various observation systems are combined with background data to define initial states for the forecasts. Current and future observation systems, in particular satellite instruments, produce large numbers of measurements with high spatial and temporal density. Such datasets significantly increase the computational costs of the assimilation and, moreover, can violate the assumption of spatially independent observation errors. To ameliorate these problems, we propose two greedy thinning algorithms, which reduce the number of assimilated observations while retaining the essential information content of the data. In the first method, the number of points in the output set is increased iteratively. We use a clustering method with a distance metric that combines spatial distance with difference in observation values. In a second scheme, we iteratively estimate the redundancy of the current observation set and remove the most redundant data points. We evaluate the proposed methods with respect to a geometric error measure and compare them with a uniform sampling scheme. We obtain good representations of the original data with thinnings retaining only a small portion of observations. We also evaluate our thinnings of ATOVS satellite data using the assimilation system of the Deutscher Wetterdienst. Impact of the thinning on the analysed fields and on the subsequent forecasts is discussed. Copyright © 2005 Royal Meteorological Society [source] Evidence for bias in estimates of local genetic structure due to sampling schemeANIMAL CONSERVATION, Issue 3 2006E. K. Latch Abstract Traditional population genetic analyses typically seek to characterize the genetic substructure caused by the nonrandom distribution of individuals. However, the genetic structuring of adult populations often does not remain constant over time, and may vary relative to season or life-history stages. Estimates of genetic structure may be biased if samples are collected at a single point in time, and will reflect the social organization of the species at the time the samples were collected. The complex population structures exhibited by many migratory species, where temporal shifts in social organization correspond to a large-scale shift in geographic distribution, serve as examples of the importance that time of sampling can have on estimates of genetic structure. However, it is often fine-scale genetic structure that is crucial for defining practical units for conservation and management and it is at this scale that distributional shifts of organisms relative to the timing of sampling may have a profound yet unrecognized impact on our ability to interpret genetic data. In this study, we used the wild turkey to investigate the effects of sampling regime on estimates of genetic structure at local scales. Using mitochondrial sequence data, nuclear microsatellite data and allozyme data, we found significant genetic structuring among localized winter flocks of wild turkeys. Conversely, we found no evidence for genetic structure among sampling locations during the spring, when wild turkeys exist in mixed assemblages of genetically differentiated winter flocks. If the lack of detectable genetic structure among individuals is due to an admixture of social units as in the case of wild turkeys during the spring, then the FIS value rather than the FST value may be the more informative statistic in regard to the levels of genetic structure among population subunits. [source] Building anisotropic sampling schemes for the estimation of anisotropic dispersalANNALS OF APPLIED BIOLOGY, Issue 3 2009S. Soubeyrand Abstract Anisotropy, a structural property of dispersal, is observed in dispersal patterns occurring for a wide range of biological systems. While dispersal models more and more often incorporate anisotropy, the sampling schemes required to collect data for validation usually do not account for the anisotropy of dispersal data. Using a parametric model already published to describe the spatial spread of a plant disease, the wheat yellow rust, we carry out a study aimed at recommending an appropriate sampling scheme for anisotropic data. In a first step, we show with a simulation study that prior knowledge of dispersal anisotropy can be used to improve the sampling scheme. One of the main guidelines to be proposed is the orientation of the sampling grid around the main dispersal directions. In a second step, we propose a sequential sampling procedure (SSP) used to automatically build anisotropic sampling schemes adapted to the actual anisotropy of dispersal. The SSP is applied to simulated and real data. The proposed methodology is expected to be adapted easily to any kind of organisms with wind-borne propagule dispersal because it does not require the inclusion of biological features specific of the considered organism. [source] Falling and explosive, dormant, and rising markets via multiple-regime financial time series modelsAPPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 1 2010Cathy W. S. Chen Abstract A multiple-regime threshold nonlinear financial time series model, with a fat-tailed error distribution, is discussed and Bayesian estimation and inference are considered. Furthermore, approximate Bayesian posterior model comparison among competing models with different numbers of regimes is considered which is effectively a test for the number of required regimes. An adaptive Markov chain Monte Carlo (MCMC) sampling scheme is designed, while importance sampling is employed to estimate Bayesian residuals for model diagnostic testing. Our modeling framework provides a parsimonious representation of well-known stylized features of financial time series and facilitates statistical inference in the presence of high or explosive persistence and dynamic conditional volatility. We focus on the three-regime case where the main feature of the model is to capturing of mean and volatility asymmetries in financial markets, while allowing an explosive volatility regime. A simulation study highlights the properties of our MCMC estimators and the accuracy and favourable performance as a model selection tool, compared with a deviance criterion, of the posterior model probability approximation method. An empirical study of eight international oil and gas markets provides strong support for the three-regime model over its competitors, in most markets, in terms of model posterior probability and in showing three distinct regime behaviours: falling/explosive, dormant and rising markets. Copyright © 2009 John Wiley & Sons, Ltd. [source] Theory & Methods: Bayesian variable selection in logistic regression: predicting company earnings directionAUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2002Richard Gerlach This paper presents a Bayesian technique for the estimation of a logistic regression model including variable selection. As in Ou & Penman (1989), the model is used to predict the direction of company earnings, one year ahead, from a large set of accounting variables from financial statements. To estimate the model, the paper presents a Markov chain Monte Carlo sampling scheme that includes the variable selection technique of Smith & Kohn (1996) and the non-Gaussian estimation method of Mira & Tierney (2001). The technique is applied to data for companies in the United States and Australia. The results obtained compare favourably to the technique used by Ou & Penman (1989) for both regions. [source] Cox Regression in Nested Case,Control Studies with Auxiliary CovariatesBIOMETRICS, Issue 2 2010Mengling Liu Summary Nested case,control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor. [source] Regression Calibration in Semiparametric Accelerated Failure Time ModelsBIOMETRICS, Issue 2 2010Menggang Yu Summary In large cohort studies, it often happens that some covariates are expensive to measure and hence only measured on a validation set. On the other hand, relatively cheap but error-prone measurements of the covariates are available for all subjects. Regression calibration (RC) estimation method (Prentice, 1982,,Biometrika,69, 331,342) is a popular method for analyzing such data and has been applied to the Cox model by Wang et al. (1997,,Biometrics,53, 131,145) under normal measurement error and rare disease assumptions. In this article, we consider the RC estimation method for the semiparametric accelerated failure time model with covariates subject to measurement error. Asymptotic properties of the proposed method are investigated under a two-phase sampling scheme for validation data that are selected via stratified random sampling, resulting in neither independent nor identically distributed observations. We show that the estimates converge to some well-defined parameters. In particular, unbiased estimation is feasible under additive normal measurement error models for normal covariates and under Berkson error models. The proposed method performs well in finite-sample simulation studies. We also apply the proposed method to a depression mortality study. [source] Design and Inference for Cancer Biomarker Study with an Outcome and Auxiliary-Dependent SubsamplingBIOMETRICS, Issue 2 2010Xiaofei Wang Summary In cancer research, it is important to evaluate the performance of a biomarker (e.g., molecular, genetic, or imaging) that correlates patients' prognosis or predicts patients' response to treatment in a large prospective study. Due to overall budget constraint and high cost associated with bioassays, investigators often have to select a subset from all registered patients for biomarker assessment. To detect a potentially moderate association between the biomarker and the outcome, investigators need to decide how to select the subset of a fixed size such that the study efficiency can be enhanced. We show that, instead of drawing a simple random sample from the study cohort, greater efficiency can be achieved by allowing the selection probability to depend on the outcome and an auxiliary variable; we refer to such a sampling scheme as,outcome and auxiliary-dependent subsampling,(OADS). This article is motivated by the need to analyze data from a lung cancer biomarker study that adopts the OADS design to assess epidermal growth factor receptor (EGFR) mutations as a predictive biomarker for whether a subject responds to a greater extent to EGFR inhibitor drugs. We propose an estimated maximum-likelihood method that accommodates the OADS design and utilizes all observed information, especially those contained in the likelihood score of EGFR mutations (an auxiliary variable of EGFR mutations) that is available to all patients. We derive the asymptotic properties of the proposed estimator and evaluate its finite sample properties via simulation. We illustrate the proposed method with a data example. [source] Joint Modeling and Analysis of Longitudinal Data with Informative Observation TimesBIOMETRICS, Issue 2 2009Yu Liang Summary In analysis of longitudinal data, it is often assumed that observation times are predetermined and are the same across study subjects. Such an assumption, however, is often violated in practice. As a result, the observation times may be highly irregular. It is well known that if the sampling scheme is correlated with the outcome values, the usual statistical analysis may yield bias. In this article, we propose joint modeling and analysis of longitudinal data with possibly informative observation times via latent variables. A two-step estimation procedure is developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, and that the asymptotic variance can be consistently estimated using the bootstrap method. Simulation studies and a real data analysis demonstrate that our method performs well with realistic sample sizes and is appropriate for practical use. [source] |