Same Data Set (same + data_set)

Distribution by Scientific Domains
Distribution within Life Sciences


Selected Abstracts


The Validity of Using Multiple Imputation for Missing Out-of-hospital Data in a State Trauma Registry

ACADEMIC EMERGENCY MEDICINE, Issue 3 2006
Craig D. Newgard MD
Objectives: To assess 1) the agreement of multiply imputed out-of-hospital values previously missing in a state trauma registry compared with known ambulance values and 2) the potential impact of using multiple imputation versus a commonly used method for handling missing data (i.e., complete case analysis) in a typical multivariable injury analysis. Methods: This was a retrospective cohort analysis. Multiply imputed out-of-hospital data from 1998 to 2003 for four variables (intubation attempt, Glasgow Coma Scale score, systolic blood pressure, and respiratory rate) were compared with known values from probabilistically linked ambulance records using measures of agreement (,, weighted ,, and Bland,Altman plots). Ambulance values were assumed to represent the "true" values for all analyses. A hypothetical multivariable regression model was used to demonstrate the impact (i.e., bias and precision of model results) of handling missing out-of-hospital data with multiple imputation versus complete case analysis. Results: A total of 6,150 matched ambulance and trauma registry records were available for comparison. Multiply imputed values for the four out-of-hospital variables demonstrated fair to good agreement with known ambulance values. When included in typical multivariable analyses, multiple imputation increased precision and reduced bias compared with using complete case analysis for the same data set. Conclusions: Multiply imputed out-of-hospital values for intubation attempt, Glasgow Coma Scale score, systolic blood pressure, and respiratory rate have fair to good agreement with known ambulance values. Multiple imputation also increased precision and reduced bias compared with complete case analysis in a typical multivariable injury model, and it should be considered for studies using out-of-hospital data from a trauma registry, particularly when substantial portions of data are missing. [source]


Spatial and temporal analysis of fMRI data on word and sentence reading

EUROPEAN JOURNAL OF NEUROSCIENCE, Issue 7 2007
Sven Haller
Abstract Written language comprehension at the word and the sentence level was analysed by the combination of spatial and temporal analysis of functional magnetic resonance imaging (fMRI). Spatial analysis was performed via general linear modelling (GLM). Concerning the temporal analysis, local differences in neurovascular coupling may confound a direct comparison of blood oxygenation level-dependent (BOLD) response estimates between regions. To avoid this problem, we parametrically varied linguistic task demands and compared only task-induced within-region BOLD response differences across areas. We reasoned that, in a hierarchical processing system, increasing task demands at lower processing levels induce delayed onset of higher-level processes in corresponding areas. The flow of activation is thus reflected in the size of task-induced delay increases. We estimated BOLD response delay and duration for each voxel and each participant by fitting a model function to the event-related average BOLD response. The GLM showed increasing activations with increasing linguistic demands dominantly in the left inferior frontal gyrus (IFG) and the left superior temporal gyrus (STG). The combination of spatial and temporal analysis allowed a functional differentiation of IFG subregions involved in written language comprehension. Ventral IFG region (BA 47) and STG subserve earlier processing stages than two dorsal IFG regions (BA 44 and 45). This is in accordance with the assumed early lexical semantic and late syntactic processing of these regions and illustrates the complementary information provided by spatial and temporal fMRI data analysis of the same data set. [source]


Assembly rules and community models for unicellular organisms: patterns in diatoms of boreal streams

FRESHWATER BIOLOGY, Issue 4 2005
JANI HEINO
Summary 1. Many studies have addressed either community models (e.g. Clementsian versus Gleasonian gradients) or assembly rules (e.g. nestedness, checkerboards) for higher plant and animal communities, but very few studies have examined different non-random distribution patterns simultaneously with the same data set. Even fewer studies have addressed generalities in the distribution patterns of unicellular organisms, such as diatoms. 2. We studied non-randomness in the spatial distribution and community composition of stream diatoms. Our data consisted of diatom surveys from 47 boreal headwater streams and small rivers in northern Finland. Our analytical approaches included ordinations, cluster analysis, null model analyses, and associated randomisation tests. 3. Stream diatom communities did not follow discrete Clementsian community types, where multiple species occur exclusively in a single community type. Rather, diatom species showed rather individualistic responses, leading to continuous Gleasonian variability in community composition. 4. Although continuous variability was the dominating pattern in the data, diatoms also showed significant nestedness and less overlap in species distribution than expected by chance. However, these patterns were probably only secondary signals from species' individualistic responses to the environment. 5. Although unicellular organisms, such as diatoms, differ from multicellular organisms in several biological characteristics, they nevertheless appear to show largely similar non-random distribution patterns previously found for higher plants and metazoans. [source]


Evaluation of automated brain MR image segmentation and volumetry methods

HUMAN BRAIN MAPPING, Issue 4 2009
Frederick Klauschen
Abstract We compare three widely used brain volumetry methods available in the software packages FSL, SPM5, and FreeSurfer and evaluate their performance using simulated and real MR brain data sets. We analyze the accuracy of gray and white matter volume measurements and their robustness against changes of image quality using the BrainWeb MRI database. These images are based on "gold-standard" reference brain templates. This allows us to assess between- (same data set, different method) and also within-segmenter (same method, variation of image quality) comparability, for both of which we find pronounced variations in segmentation results for gray and white matter volumes. The calculated volumes deviate up to >10% from the reference values for gray and white matter depending on method and image quality. Sensitivity is best for SPM5, volumetric accuracy for gray and white matter was similar in SPM5 and FSL and better than in FreeSurfer. FSL showed the highest stability for white (<5%), FreeSurfer (6.2%) for gray matter for constant image quality BrainWeb data. Between-segmenter comparisons show discrepancies of up to >20% for the simulated data and 24% on average for the real data sets, whereas within-method performance analysis uncovered volume differences of up to >15%. Since the discrepancies between results reach the same order of magnitude as volume changes observed in disease, these effects limit the usability of the segmentation methods for following volume changes in individual patients over time and should be taken into account during the planning and analysis of brain volume studies. Hum Brain Mapp, 2009. © 2008 Wiley-Liss, Inc. [source]


Streamflow estimation using optimal regional dependency function

HYDROLOGICAL PROCESSES, Issue 25 2009
Abdüsselam Altunkaynak
Abstract The determination of spatial dependency of regionalized variable (ReV) is important in engineering studies. Regional dependency function that leads to calculation of weighting coefficients is required in order to make regional or point-wise estimations. After obtaining this dependency function, it is possible to complete missing records in the time series and locate new measurement station. Also determination of regional dependency function is also useful to understand the regional variation of ReV. Point Cumulative Semi-Variogram (PCSV) is another methodology to understand the regional dependency of ReV related to the magnitude and the location. However, this methodology is not useful to determine the weighting coefficient, which is required to make regional and point-wise estimations. However, in Point Semi-Variogram (PSV) proposed here, weighting coefficient depends on both magnitude and location. Although the regional dependency function has a fluctuating structure in PSV approach, this function gradually increases with distance in PCSV. The study area is selected in Mississippi river basin with 38 streamflow stations used for PCSV application before. It is aimed to compare two different geostatistical models for the same data set. PSV method has an ability to determine the value of variable along with optimum number of neighbour stations and influence radius. PSV and slope PSV approaches are compared with the PCSV. It was shown that slope slope point semi-variogram (SPSV) approaches had relative error below 5%, and PSV and PCSV methods revealed relative errors below 10%. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Non-parametric statistical methods for multivariate calibration model selection and comparison,

JOURNAL OF CHEMOMETRICS, Issue 12 2003
Edward V. Thomas
Abstract Model selection is an important issue when constructing multivariate calibration models using methods based on latent variables (e.g. partial least squares regression and principal component regression). It is important to select an appropriate number of latent variables to build an accurate and precise calibration model. Inclusion of too few latent variables can result in a model that is inaccurate over the complete space of interest. Inclusion of too many latent variables can result in a model that produces noisy predictions through incorporation of low-order latent variables that have little or no predictive value. Commonly used metrics for selecting the number of latent variables are based on the predicted error sum of squares (PRESS) obtained via cross-validation. In this paper a new approach for selecting the number of latent variables is proposed. In this new approach the prediction errors of individual observations (obtained from cross-validation) are compared across models incorporating varying numbers of latent variables. Based on these comparisons, non-parametric statistical methods are used to select the simplest model (least number of latent variables) that provides prediction quality that is indistinguishable from that provided by more complex models. Unlike methods based on PRESS, this new approach is robust to the effects of anomalous observations. More generally, the same approach can be used to compare the performance of any models that are applied to the same data set where reference values are available. The proposed methodology is illustrated with an industrial example involving the prediction of gasoline octane numbers from near-infrared spectra. Published in 2004 by John Wiley & Sons, Ltd. [source]


Intertopic information mining for query-based summarization

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 5 2010
You Ouyang
In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated query) for sentence ranking, they propose a novel ranking approach that incorporates intertopic information mining. Intertopic information, in contrast to intratopic information, is able to reveal pairwise topic relationships and thus can be considered as the bridge across different topics. In this article, the intertopic information is used for transferring word importance learned from known topics to unknown topics under a learning-based summarization framework. To mine this information, the authors model the topic relationship by clustering all the words in both known and unknown topics according to various kinds of word conceptual labels, which indicate the roles of the words in the topic. Based on the mined relationships, we develop a probabilistic model using manually generated summaries provided for known topics to predict ranking scores for sentences in unknown topics. A series of experiments have been conducted on the Document Understanding Conference (DUC) 2006 data set. The evaluation results show that intertopic information is indeed effective for sentence ranking and the resultant summarization system performs comparably well to the best-performing DUC participating systems on the same data set. [source]


Removing undersampling artifacts in DCE-MRI studies using independent components analysis

MAGNETIC RESONANCE IN MEDICINE, Issue 4 2008
A.L. Martel
Abstract In breast MRI mammography both high temporal resolution and high spatial resolution have been shown to be important in improving specificity. Adaptive methods such as projection reconstruction time-resolved imaging of contrast kinetics (PR-TRICKS) allow images to be reconstructed at various temporal and spatial resolutions from the same data set. The main disadvantage is that the undersampling, which is necessary to produce high temporal resolution images, leads to the presence of streak artifacts in the images. We present a novel method of removing these artifacts using independent components analysis (ICA) and demonstrate that this results in a significant improvement in image quality for both simulation studies and for patient dynamic contrast-enhanced (DCE)-MRI images. We also investigate the effect of artifacts on two quantitative measures of contrast enhancement. Using simulation studies we demonstrate that streak artifacts lead to pronounced periodic oscillations in pixel concentration curves which, in turn, lead to increased errors and introduce bias into heuristic measurements. ICA filtering significantly reduces this bias and improves accuracy. Pharmacokinetic modeling was more robust and there was no evidence of bias due to the presence of streak artifacts. ICA filtering did not significantly reduce the errors in the estimated pharmacokinetic parameters; however, the chi-squared error was greatly reduced after ICA filtering. Magn Reson Med, 2008. © 2008 Wiley-Liss, Inc. [source]


Perfusion MRI with radial acquisition for arterial input function assessment,

MAGNETIC RESONANCE IN MEDICINE, Issue 5 2007
Eugene G. Kholmovski
Abstract Quantification of myocardial perfusion critically depends on accurate arterial input function (AIF) and tissue enhancement curves (TECs). Except at low doses, the AIF is inaccurate because of the long saturation recovery time (SRT) of the pulse sequence. The choice of dose and SRT involves a trade-off between the accuracy of the AIF and the signal-to-noise ratio (SNR) of the TEC. Recent methods to resolve this trade-off are based on the acquisition of two data sets: one to accurately estimate the AIF, and one to find the high-SNR TEC. With radial k -space sampling, a set of images with varied SRTs can be reconstructed from the same data set, allowing an accurate assessment of the AIF and TECs, and their conversion to contrast agent (CA) concentration. This study demonstrates the feasibility of using a radial acquisition for quantitative myocardial perfusion imaging. Magn Reson Med 57:821,827, 2007. © 2007 Wiley-Liss, Inc. [source]


Internal algorithm variability and among-algorithm discordance in statistical haplotype reconstruction

MOLECULAR ECOLOGY, Issue 8 2009
ZU-SHI HUANG
The potential effectiveness of statistical haplotype inference makes it an area of active exploration over the last decade. There are several complications of statistical inference, including: the same algorithm can produce different solutions for the same data set, which reflects the internal algorithm variability; different algorithms can give different solutions for the same data set, reflecting the discordance among algorithms; and the algorithms per se are unable to evaluate the reliability of the solutions even if they are unique, this being a general limitation of all inference methods. With the aim of increasing the confidence of statistical inference results, consensus strategy appears to be an effective means to deal with these problems. Several authors have explored this with different emphases. Here we discuss two recent studies examining the internal algorithm variability and among-algorithm discordance, respectively, and evaluate the different outcomes of these analyses, in light of Orzack (2009) comment. Until other, better methods are developed, a combination of these two approaches should provide a practical way to increase the confidence of statistical haplotyping results. [source]


A Combinatorial Approach to the Variable Selection in Multiple Linear Regression: Analysis of Selwood et al.

MOLECULAR INFORMATICS, Issue 6 2003
A Case Study, Data Set
Abstract A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632,0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time. [source]


Counting individual galaxies from deep 24-,m Spitzer surveys

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, Issue 4 2006
G. Rodighiero
ABSTRACT We address the question of how to deal with confusion-limited surveys in the mid-infrared (MIR) domain by using information from shorter-wavelength observations over the same sky regions. Such information, once applied to apparently extended MIR sources, which are indeed ,blends' of two or more different sources, allow us to disentangle the single counterparts and to split the measured flux density into different components. We present the application of this method to the 24-,m Spitzer archival data in the Great Observatories Origins Deep Survey ELAIS-N1 (GOODS EN1) test field, where apparently extended, ,blended' sources constitute about 20 per cent of a reliable sample of 983 sources detected above the 5, threshold down to 40 ,Jy. As a shorter-wavelength data set, we have considered the public Infrared Array Camera (IRAC) images and catalogues of the same field. We show that the 24-,m sample is almost unbiased down to ,40 ,Jy and the careful application of the deblending procedure does not require any statistical completeness correction (at least at the flux level considered). This is probed by direct comparison of our results with results in the literature that analysed the same data set through extensive Monte Carlo simulations. The extrapolation of the source counts down to fainter fluxes suggests that our 24-,m sample is able to resolve ,62 per cent of the cosmic background down to a flux level of 38 ,Jy. [source]


Identifying environmental signals from population abundance data using multivariate time-series analysis

OIKOS, Issue 11 2009
Masami Fujiwara
Individual organisms are affected by various natural and anthropogenic environmental factors throughout their life history. This is reflected in the way population abundance fluctuates. Consequently, observed population dynamics are often produced by the superimposition of multiple environmental signals. This complicates the analysis of population time-series. Here, a multivariate time-series method called maximum autocorrelation factor analysis (MAFA) was used to extract underlying signals from multiple population time series data. The extracted signals were compared with environmental variables that were suspected to affect the populations. Finally, a simple multiple regression analysis was applied to the same data set, and the results from the regression analysis were compared with those from MAFA. The extracted signals with MAFA were strongly associated with the environmental variables, suggesting that they represent environmental factors. On the other hand, with the multiple regression analysis, one of the important signals was not identifiable, revealing the shortcoming of the conventional approach. MAFA summarizes data based on their lag-one autocorrelation. This allows the identification of underlying signals with a small effect size on population abundance during the observation. It also uses multiple time series collected in parallel; this enables us to effectively analyze short time series. In this study, annual spawning adult counts of Chinook salmon at various locations within the Klamath Basin, California, were analyzed. [source]


Can fluctuating asymmetry be used to detect inbreeding and loss of genetic diversity in endangered populations?

ANIMAL CONSERVATION, Issue 2 2000
Dean M. Gilligan
Fluctuating asymmetry (FA), a measure of developmental stability, has been proposed as a simple technique for identifying populations suffering from inbreeding and a loss of genetic diversity. However, there is controversy regarding the relationship between FA and both allozyme heterozygosity and pedigree inbreeding coefficients (F). FA of sternopleural bristle number in Drosophila melanogaster was measured in populations maintained at effective sizes of 25 (8 replicates), 50 (6), 100 (4), 250 (3) and 500 (2) for 50 generations (inbreeding coefficients of 0.05,0.71). FA was calculated from the same data set using three different indices (FA1, FA5 and FA6). There was no significant relationship of FA with pedigree inbreeding coefficients for any of the three indices. The relationship between FA and allozyme heterozygosity was non-significant for indices FA5 and FA6 (the more powerful indices) and only significant for FA1. A second comparison of highly inbred (F , 1) populations with their outbred base population showed significantly greater FA in the inbred populations only when analysed with FA6. Analysis of the same data using FA1 and FA5 showed non-significant relationships in the opposite direction. If a relationship between FA and genetic diversity does exist, it is weak and inconsistent. Consequently, our results do not support the use of FA as a monitoring tool to detect inbreeding or loss of genetic diversity. [source]


Power and Sample Size Estimation for the Wilcoxon Rank Sum Test with Application to Comparisons of C Statistics from Alternative Prediction Models

BIOMETRICS, Issue 1 2009
B. Rosner
Summary The Wilcoxon Mann-Whitney (WMW) U test is commonly used in nonparametric two-group comparisons when the normality of the underlying distribution is questionable. There has been some previous work on estimating power based on this procedure (Lehmann, 1998, Nonparametrics). In this article, we present an approach for estimating type II error, which is applicable to any continuous distribution, and also extend the approach to handle grouped continuous data allowing for ties. We apply these results to obtaining standard errors of the area under the receiver operating characteristic curve (AUROC) for risk-prediction rules under H1 and for comparing AUROC between competing risk prediction rules applied to the same data set. These results are based on SAS -callable functions to evaluate the bivariate normal integral and are thus easily implemented with standard software. [source]


Bond-based 3D-chiral linear indices: Theory and QSAR applications to central chirality codification

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 15 2008
Juan A. Castillo-Garit
Abstract The recently introduced non-stochastic and stochastic bond-based linear indices are been generalized to codify chemical structure information for chiral drugs, making use of a trigonometric 3D-chirality correction factor. These improved modified descriptors are applied to several well-known data sets to validate each one of them. Particularly, Cramer's steroid data set has become a benchmark for the assessment of novel quantitative structure activity relationship methods. This data set has been used by several researchers using 3D-QSAR approaches such as Comparative Molecular Field Analysis, Molecular Quantum Similarity Measures, Comparative Molecular Moment Analysis, E-state, Mapping Property Distributions of Molecular Surfaces, and so on. For that reason, it is selected by us for the sake of comparability. In addition, to evaluate the effectiveness of this novel approach in drug design we model the angiotensin-converting enzyme inhibitory activity of perindoprilate's ,-stereoisomers combinatorial library, as well as codify information related to a pharmacological property highly dependent on the molecular symmetry of a set of seven pairs of chiral N -alkylated 3-(3-hydroxyphenyl)-piperidines that bind ,-receptors. The validation of this method is achieved by comparison with earlier publications applied to the same data sets. The non-stochastic and stochastic bond-based 3D-chiral linear indices appear to provide a very interesting alternative to other more common 3D-QSAR descriptors. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2008 [source]


Root cadmium desorption methods and their evaluation with compartmental modeling

NEW PHYTOLOGIST, Issue 1 2010
Wayne T. Buckley
Summary ,Desorption of plant roots is often employed in studies of plant physiology and nutrition; however, there have been few studies on the validity of desorption procedures. ,Branched and in-line kinetic models with five compartments , cadmium (Cd)-chelate, Cd2+, root apoplast, root symplast and vacuole , were developed to evaluate the efficacy of diethylenetriaminepentaacetic acid (DTPA) and CaCl2 methods for the desorption of Cd from roots of durum wheat seedlings. Solution Cd2+ could exchange with apoplast and symplast Cd simultaneously in the branched model and sequentially in the in-line model. ,A 10-min desorption with 1 × 10,6 M DTPA at room temperature or cold (0°C) 5 × 10,3 M CaCl2 was required to achieve 99% recovery of apoplast-bound 109Cd when experimental results were interpreted with the branched model. However, when the same data sets were analysed with the in-line model, only partial desorption was achieved. Arguments are presented that suggest that the branched model is correct. ,It is suggested that compartmental modeling is a suitable tool for the study of plant root uptake and desorption kinetics, and that there are advantages over more commonly used calculation procedures. [source]