Test Data Set (test + data_set)

Distribution by Scientific Domains


Selected Abstracts


Accuracy and reproducibility of manual and semiautomated quantification of MS lesions by MRI

JOURNAL OF MAGNETIC RESONANCE IMAGING, Issue 3 2003
Edward A. Ashton PhD
Abstract Purpose To evaluate the accuracy, reproducibility, and speed of two semiautomated methods for quantifying total white matter lesion burden in multiple sclerosis (MS) patients with respect to manual tracing and to other methods presented in recent literature. Materials and Methods Two methods involving the use of MRI for semiautomated quantification of total lesion burden in MS patients were examined. The first method, geometrically constrained region growth (GEORG), requires user specification of lesion location. The second technique, directed multispectral segmentation (DMSS), requires only the location of a single exemplar lesion. Test data sets included both clinical MS data and MS brain phantoms. Results The mean processing times were 60 minutes for manual tracing, 10 minutes for region growth, and 3 minutes for directed segmentation. Intra- and interoperator coefficients of variation (CVs) were 5.1% and 16.5% for manual tracing, 1.4% and 2.3% for region growth, and 1.5% and 5.2% for directed segmentation. The average deviations from manual tracing were 9% for region growth and 5.7% for directed segmentation. Conclusion Both semiautomated methods were shown to have a significant advantage over manual tracing in terms of speed and precision. The accuracy of both methods was acceptable, given the high variability of the manual results. J. Magn. Reson. Imaging 2003;17:300,308. © 2003 Wiley-Liss, Inc. [source]


Removing bias from solvent atoms in electron density maps

JOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 4 2008
Eric N. Brown
Atomic structures of proteins determined via protein crystallography contain numerous solvent atoms. The experimental data for the determination of a water molecule's O-atom position is often a small contained blob of unidentified electron density. Unfortunately, the nature of crystallographic refinement lets poorly placed solvent atoms bias the future refined positions of all atoms in the crystal structure. This research article presents the technique of omit-maps applied to remove the bias introduced by poorly determined solvent atoms, enabling the identification of incorrectly placed water molecules in partially refined crystal structures. A total of 160 protein crystal structures with 45,912 distinct water molecules were processed using this technique. Most of the water molecules in the deposited structures were well justified. However, a few of the solvent atoms in this test data set changed appreciably in position, displacement parameter or electron density when fitted to the solvent omit-map, raising questions about how much experimental support exists for these solvent atoms. [source]


Environmental determinants of vascular plant species richness in the Austrian Alps

JOURNAL OF BIOGEOGRAPHY, Issue 7 2005
Dietmar Moser
Abstract Aim, To test predictions of different large-scale biodiversity hypotheses by analysing species richness patterns of vascular plants in the Austrian Alps. Location, The Austrian part of the Alps (c. 53,500 km2). Methods, Within the floristic inventory of Central Europe the Austrian part of the Alps were systematically mapped for vascular plants. Data collection was based on a rectangular grid of 5 × 3 arc minutes (34,35 km2). Emerging species richness patterns were correlated with several environmental factors using generalized linear models. Primary environmental variables like temperature, precipitation and evapotranspiration were used to test climate-related hypotheses of species richness. Additionally, spatial and temporal variations in climatic conditions were considered. Bedrock geology, particularly the amount of calcareous substrates, the proximity to rivers and lakes and secondary variables like topographic, edaphic and land-use heterogeneity were used as additional predictors. Model results were evaluated by correlating modelled and observed species numbers. Results, Our final multiple regression model explains c. 50% of the variance in species richness patterns. Model evaluation results in a correlation coefficient of 0.64 between modelled and observed species numbers in an independent test data set. Climatic variables like temperature and potential evapotranspiration (PET) proved to be by far the most important predictors. In general, variables indicating climatic favourableness like the maxima of temperature and PET performed better than those indicating stress, like the respective minima. Bedrock mineralogy, especially the amount of calcareous substrate, had some additional explanatory power but was less influential than suggested by comparable studies. The amount of precipitation does not have any effect on species richness regionally. Among the descriptors of heterogeneity, edaphic and land-use heterogeneity are more closely correlated with species numbers than topographic heterogeneity. Main conclusions, The results support energy-driven processes as primary determinants of vascular plant species richness in temperate mountains. Stressful conditions obviously decrease species numbers, but presence of favourable habitats has higher predictive power in the context of species richness modelling. The importance of precipitation for driving global species diversity patterns is not necessarily reflected regionally. Annual range of temperature, an indicator of short-term climatic stability, proved to be of minor importance for the determination of regional species richness patterns. In general, our study suggests environmental heterogeneity to be of rather low predictive value for species richness patterns regionally. However, it may gain importance at more local scales. [source]


DNA barcoding of stylommatophoran land snails: a test of existing sequences

MOLECULAR ECOLOGY RESOURCES, Issue 4 2009
ANGUS DAVISON
Abstract DNA barcoding has attracted attention because it is a potentially simple and universal method for taxonomic assignment. One anticipated problem in applying the method to stylommatophoran land snails is that they frequently exhibit extreme divergence of mitochondrial DNA sequences, sometimes reaching 30% within species. We therefore trialled the utility of barcodes in identifying land snails, by analysing the stylommatophoran cytochrome oxidase subunit I sequences from GenBank. Two alignments of 381 and 228 base pairs were used to determine potential error rates among a test data set of 97 or 127 species, respectively. Identification success rates using neighbour-joining phylogenies were 92% for the longer sequence and 82% for the shorter sequence, indicating that a high degree of mitochondrial variation may actually be an advantage when using phylogeny-based methods for barcoding. There was, however, a large overlap between intra- and interspecific variation, with assignment failure (per cent of samples not placed with correct species) particularly associated with a low degree of mitochondrial variation (Kimura 2-parameter distance < 0.05) and a small GenBank sample size (< 25 per species). Thus, while the optimum intra/interspecific threshold value was 4%, this was associated with an overall error of 32% for the longer sequences and 44% for the shorter sequences. The high error rate necessitates that barcoding of land snails is a potentially useful method to discriminate species of land snail, but only when a baseline has first been established using conventional taxonomy and sample DNA sequences. There is no evidence for a barcoding gap, ruling out species discovery based on a threshold value alone. [source]


Artificial neural networks as statistical tools in epidemiological studies: analysis of risk factors for early infant wheeze

PAEDIATRIC & PERINATAL EPIDEMIOLOGY, Issue 6 2004
Andrea Sherriff
Summary Artificial neural networks (ANNs) are being used increasingly for the prediction of clinical outcomes and classification of disease phenotypes. A lack of understanding of the statistical principles underlying ANNs has led to widespread misuse of these tools in the biomedical arena. In this paper, the authors compare the performance of ANNs with that of conventional linear logistic regression models in an epidemiological study of infant wheeze. Data on the putative risk factors for infant wheeze have been obtained from a sample of 7318 infants taking part in the Avon Longitudinal Study of Parents and Children (ALSPAC). The data were analysed using logistic regression models and ANNs, and performance based on misclassification rates of a validation data set were compared. Misclassification rates in the training data set decreased as the complexity of the ANN increased: h = 0: 17.9%; h = 2: 16.2%; h = 5: 14.9%, and h = 10: 9.2%. However, the more complex models did not generalise well to new data sets drawn from the same population: validation data set misclassification rates: h = 0: 17.9%; h = 2: 19.6%; h = 5: 20.2% and h = 10: 22.9%. There is no evidence from this study that ANNs outperform conventional methods of analysing epidemiological data. Increasing the complexity of the models serves only to overfit the model to the data. It is important that a validation or test data set is used to assess the performance of highly complex ANNs to avoid overfitting. [source]


Quantifying instrument errors in macromolecular X-ray data sets

ACTA CRYSTALLOGRAPHICA SECTION D, Issue 6 2010
Kay Diederichs
An indicator which is calculated after the data reduction of a test data set may be used to estimate the (systematic) instrument error at a macromolecular X-ray source. The numerical value of the indicator is the highest signal-to-noise [I/,(I)] value that the experimental setup can produce and its reciprocal is related to the lower limit of the merging R factor. In the context of this study, the stability of the experimental setup is influenced and characterized by the properties of the X-ray beam, shutter, goniometer, cryostream and detector, and also by the exposure time and spindle speed. Typical values of the indicator are given for data sets from the JCSG archive. Some sources of error are explored with the help of test calculations using SIM_MX [Diederichs (2009), Acta Cryst. D65, 535,542]. One conclusion is that the accuracy of data at low resolution is usually limited by the experimental setup rather than by the crystal. It is also shown that the influence of vibrations and fluctuations may be mitigated by a reduction in spindle speed accompanied by stronger attenuation. [source]


Support vector machines-based modelling of seismic liquefaction potential

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 10 2006
Mahesh Pal
Abstract This paper investigates the potential of support vector machines (SVM)-based classification approach to assess the liquefaction potential from actual standard penetration test (SPT) and cone penetration test (CPT) field data. SVMs are based on statistical learning theory and found to work well in comparison to neural networks in several other applications. Both CPT and SPT field data sets is used with SVMs for predicting the occurrence and non-occurrence of liquefaction based on different input parameter combination. With SPT and CPT test data sets, highest accuracy of 96 and 97%, respectively, was achieved with SVMs. This suggests that SVMs can effectively be used to model the complex relationship between different soil parameter and the liquefaction potential. Several other combinations of input variable were used to assess the influence of different input parameters on liquefaction potential. Proposed approach suggest that neither normalized cone resistance value with CPT data nor the calculation of standardized SPT value is required with SPT data. Further, SVMs required few user-defined parameters and provide better performance in comparison to neural network approach. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Predicting avian patch occupancy in a fragmented landscape: do we know more than we think?

JOURNAL OF APPLIED ECOLOGY, Issue 5 2009
Danielle F. Shanahan
Summary 1.,A recent and controversial topic in landscape ecology is whether populations of species respond to habitat fragmentation in a general fashion. Empirical research has provided mixed support, resulting in controversy about the use of general rules in landscape management. Rather than simply assessing post hoc whether individual species follow such rules, a priori testing could shed light on their accuracy and utility for predicting species response to landscape change. 2.,We aim to create an a priori model that predicts the presence or absence of multiple species in habitat patches. Our goal is to balance general theory with relevant species life-history traits to obtain high prediction accuracy. To increase the utility of this work, we aim to use accessible methods that can be applied using readily available inexpensive resources. 3.,The classification tree patch-occupancy model we create for birds is based on habitat suitability, minimum area requirements, dispersal potential of each species and overall landscape connectivity. 4.,To test our model we apply it to the South East Queensland region, Australia, for 17 bird species with varying dispersal potential and habitat specialization. We test the accuracy of our predictions using presence,absence information for 55 vegetation patches. 5.,Overall we achieve Cohen's kappa of 0·33, or ,fair' agreement between the model predictions and test data sets, and generally a very high level of absence prediction accuracy. Habitat specialization appeared to influence the accuracy of the model for different species. 6.,We also compare the a priori model to the statistically derived model for each species. Although this ,optimal model' generally differed from our original predictive model, the process revealed ways in which it could be improved for future attempts. 7.,Synthesis and applications. Our study demonstrates that ecological generalizations alongside basic resources (a vegetation map and some species-specific information) can provide conservative accuracy for predicting species occupancy in remnant vegetation patches. We show that the process of testing and developing models based on general rules could provide basic tools for conservation managers to understand the impact of current or planned landscape change on wildlife populations. [source]


The interdependence of wavelength, redundancy and dose in sulfur SAD experiments

ACTA CRYSTALLOGRAPHICA SECTION D, Issue 12 2008
Michele Cianci
In the last decade, the popularity of sulfur SAD anomalous dispersion experiments has spread rapidly among synchrotron users as a quick and streamlined way of solving the phase problem in macromolecular crystallography. On beamline 10 at SRS (Daresbury Laboratory, UK), a versatile design has allowed test data sets to be collected at six wavelengths between 0.979 and 2.290,Å in order to evaluate the importance and the interdependence of experimental variables such as the Bijvoet ratio, wavelength, resolution limit, data redundancy and absorbed X-ray dose in the sample per data set. All the samples used in the experiments were high-quality hen egg-white lysozyme crystals. X-radiation damage was found to affect disulfide bridges after the crystals had been given a total dose of 0.20 × 107,Gy. However, with such a total dose, it was still possible in all cases to find a strategy to collect data sets to determine the sulfur substructure and produce good-quality phases by choosing an optimum combination of wavelength, exposure time and redundancy. A ,|,ano|/,(,ano), greater than 1.5 for all resolution shells was a necessary requirement for successful sulfur SAD substructure location. Provided this is achieved, it seems possible to find an optimum compromise between wavelength, redundancy and dose to provide phasing information. The choice of the wavelength should then follow the sample composition and the diffracting properties of the crystal. For strongly diffracting crystals, wavelengths equal or shorter than 1.540,Å can be selected to capture the available data (provided the Bijvoet ratio is reasonable), while a longer wavelength, to gain as high a Bijvoet ratio as possible, must be used for more weakly diffracting crystals. These results suggest that an approach to a sulfur SAD experiment based on a complete description of the crystal system and the instrument for data collection is useful. [source]