Cross-validation Procedure (cross-validation + procedure)

Distribution by Scientific Domains


Selected Abstracts


A multivariate logistic regression equation to screen for dysglycaemia: development and validation

DIABETIC MEDICINE, Issue 5 2005
B. P. Tabaei
Abstract Aims To develop and validate an empirical equation to screen for dysglycaemia [impaired fasting glucose (IFG), impaired glucose tolerance (IGT) and undiagnosed diabetes]. Methods A predictive equation was developed using multiple logistic regression analysis and data collected from 1032 Egyptian subjects with no history of diabetes. The equation incorporated age, sex, body mass index (BMI), post-prandial time (self-reported number of hours since last food or drink other than water), systolic blood pressure, high-density lipoprotein (HDL) cholesterol and random capillary plasma glucose as independent covariates for prediction of dysglycaemia based on fasting plasma glucose (FPG) , 6.1 mmol/l and/or plasma glucose 2 h after a 75-g oral glucose load (2-h PG) , 7.8 mmol/l. The equation was validated using a cross-validation procedure. Its performance was also compared with static plasma glucose cut-points for dysglycaemia screening. Results The predictive equation was calculated with the following logistic regression parameters: P = 1 + 1/(1 + e,X) = where X = ,8.3390 + 0.0214 (age in years) + 0.6764 (if female) + 0.0335 (BMI in kg/m2) + 0.0934 (post-prandial time in hours) + 0.0141 (systolic blood pressure in mmHg) , 0.0110 (HDL in mmol/l) + 0.0243 (random capillary plasma glucose in mmol/l). The cut-point for the prediction of dysglycaemia was defined as a probability , 0.38. The equation's sensitivity was 55%, specificity 90% and positive predictive value (PPV) 65%. When applied to a new sample, the equation's sensitivity was 53%, specificity 89% and PPV 63%. Conclusions This multivariate logistic equation improves on currently recommended methods of screening for dysglycaemia and can be easily implemented in a clinical setting using readily available clinical and non-fasting laboratory data and an inexpensive hand-held programmable calculator. [source]


Predicting pasture root density from soil spectral reflectance: field measurement

EUROPEAN JOURNAL OF SOIL SCIENCE, Issue 1 2010
B. H. KUSUMO
This paper reports the development and evaluation of a field technique for in situ measurement of root density using a portable spectroradiometer. The technique was evaluated at two sites in permanent pasture on contrasting soils (an Allophanic and a Fluvial Recent soil) in the Manawatu region, New Zealand. Using a modified soil probe, reflectance spectra (350,2500 nm) were acquired from horizontal surfaces at three depths (15, 30 and 60 mm) of an 80-mm diameter soil core, totalling 108 samples for both soils. After scanning, 3-mm soil slices were taken at each depth for root density measurement and soil carbon (C) and nitrogen (N) analysis. The two soils exhibited a wide range of root densities from 1.53 to 37.03 mg dry root g,1 soil. The average root density in the Fluvial soil (13.21 mg g,1) was twice that in the Allophanic soil (6.88 mg g,1). Calibration models, developed using partial least squares regression (PLSR) of the first derivative spectra and reference data, were able to predict root density on unknown samples using a leave-one-out cross-validation procedure. The root density predictions were more accurate when the samples from the two soil types were separated (rather than grouped) to give sub-populations (n = 54) of spectral data with more similar attributes. A better prediction of root density was achieved in the Allophanic soil (r2 = 0.83, ratio prediction to deviation (RPD ) = 2.44, root mean square error of cross-validation (RMSECV ) = 1.96 mg g ,1) than in the Fluvial soil (r2 = 0.75, RPD = 1.98, RMSECV = 5.11 mg g ,1). It is concluded that pasture root density can be predicted from soil reflectance spectra acquired from field soil cores. Improved PLSR models for predicting field root density can be produced by selecting calibration data from field data sources with similar spectral attributes to the validation set. Root density and soil C content can be predicted independently, which could be particularly useful in studies examining potential rates of soil organic matter change. [source]


The construct validity of the client questionnaire of the Wisconsin Quality of Life Index , a cross-validation study

INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, Issue 3 2003
Jean Caron
Abstract The Wisconsin Quality of Life Index (W-QLI, Becker, Diamond and Sainfort, 1993) consists of eight scales: satisfaction with life domains, occupational activities, symptoms, physical health, social relations/support, finances, psychological wellbeing, and activities of daily living. The W-QLI has been modified to fit the characteristics of the Canadian population, the universal Canadian health system, and community and social services in Canada and the modified form was named CaW-QLI (Diaz, Mercier, Hachey, Caron, and Boyer, 1999). This study will verify the empirical basis of these theoretical dimensions by applying a cross-validation procedure on two samples, most of whose subjects have a serious mental illness. Confirmatory factor analyses and exploratory factor analyses using the principal component extraction technique with varimax rotation were applied. With the exception of the occupational activities domain, the remaining scales were correctly identified by the factor analyses on each sample. The occupational activities scale should be developed by additional items for representing this scale, which is too brief, and two other items should be revised in order to improve the quality of the instrument. Copyright © 2003 Whurr Publishers Ltd. [source]


Quantitative structure/property relationship analysis of Caco-2 permeability using a genetic algorithm-based partial least squares method

JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 10 2002
Fumiyoshi Yamashita
Abstract Caco-2 cell monolayers are widely used systems for predicting human intestinal absorption. This study was carried out to develop a quantitative structure,property relationship (QSPR) model of Caco-2 permeability using a novel genetic algorithm-based partial least squares (GA-PLS) method. The Caco-2 permeability data for 73 compounds were taken from the literature. Molconn-Z descriptors of these compounds were calculated as molecular descriptors, and the optimal subset of the descriptors was explored by GA-PLS analysis. A fitness function considering both goodness-of-fit to the training data and predictability of the testing data was adopted throughout the genetic algorithm-driven optimization procedure. The final PLS model consisting of 24 descriptors gave a correlation coefficient (r) of 0.886 for the entire dataset and a predictive correlation coefficient (rpred) of 0.825 that was evaluated by a leave-some-out cross-validation procedure. Thus, the GA-PLS analysis proved to be a reasonable QSPR modeling approach for predicting Caco-2 permeability. © 2002 Wiley-Liss Inc. and the American Pharmaceutical Association J Pharm Sci 91:2230,2239, 2002 [source]


Ping-pong cross-validation in real space: a method for increasing the phasing power of a partial model without risk of model bias

ACTA CRYSTALLOGRAPHICA SECTION D, Issue 2 2003
John F. Hunt
Experimental phases could only be obtained to 4.4,Å resolution for crystals of the SecA translocation ATPase. Density modification of these phases exploiting the 65% solvent content of the crystal produced a map from which an approximate backbone model could be built for 80% of the structure. Combining the phases inferred from this partial model with the MIR phases and repeating the density modification produced an improved map from which a more complete backbone model could be built. However, this procedure converged before yielding a map, that allowed unambiguous sequence assignment for the majority of the protein molecule. In order to avoid the likely model bias associated with a speculative attempt at sequence assignment, a real-space cross-validation procedure was employed to facilitate completion of the crystal structure based on partial model phasing. The protein was partitioned into two disjoint sets of residues. Models in which the side chains were built for residues in one of the two sets were used for phase combination and density modification in order to produce improved electron density for interpretation of residues in the other set that had not been included in the model. Residues in the two sets were therefore omitted from the model in alternation except at sites where the side chain could be identified definitively based on phasing with the other set. This ping-pong cross-validation procedure allowed partial model phasing to be used to complete the crystal structure of SecA without being impeded by model bias. These results show that the structure of a large protein molecule can be solved with exclusively low-resolution experimental phase information based on intensive use of partial model phasing and density modification. Real-space cross-validation can be applied to reduce the risk of model bias associated with partial model phasing, streamlining this approach and expanding its range of applicability. [source]


DATA ANALYSIS OF PENETROMETRIC FORCE/DISPLACEMENT CURVES FOR THE CHARACTERIZATION OF WHOLE APPLE FRUITS

JOURNAL OF TEXTURE STUDIES, Issue 4 2005
C. CAMPS
ABSTRACT The objective of the present study was to compare two chemometric approaches for characterizing the rheological properties of fruits from puncture test force/displacement curves. The first approach (parameter approach) computed six texture parameters from the curves, which were supposed to be representative of skin hardness, fruit deformation before skin rupture, flesh firmness and mechanical work needed to penetrate the fruit. The second approach (whole curve approach) used the whole digitized curve (300 data points) in further data processing. Two experimental studies were compared: first, the variability of the rheological parameters of five apple cultivars; second, the rheological variability that was characterized as a function of storage conditions. For both approaches, factorial discriminant analysis was applied to discriminate the fruits based on the measured rheological properties. The qualitative groups in factorial discriminant analysis were either the apple cultivar or the storage conditions (days and temperatures of storage). The tests were carried out using cross-validation procedures, making it possible to compute the number of fruits correctly identified. Thus the percentage of correct identification was 92% and 87% for using the parameter and the whole curve approaches, respectively. The discrimination of storage duration was less accurate for both approaches giving about 50% correct identifications. Comparison of the percentage of correct classifications based on the whole curve and the parameter approaches showed that the six computed parameters gave a good summary of the information present in the curve. The whole curve approach showed that some additional information, not present in the six parameters, may be appropriate for a complete description of the fruit rheology. [source]


Developing a modern pollen,climate calibration data set for Norway

BOREAS, Issue 4 2010
ANNE E. BJUNE
Bjune, A. E., Birks, H. J. B., Peglar, S. M. & Odland, A. 2010: Developing a modern pollen,climate calibration data set for Norway. Boreas, Vol. 39, pp. 674,688. 10.1111/j.1502-3885.2010.00158.x. ISSN 0300-9483. Modern pollen,climate data sets consisting of modern pollen assemblages and modern climate data (mean July temperature and mean annual precipitation) have been developed for Norway based on 191 lakes and 321 lakes. The original 191-lake data set was designed to optimize the distribution of the lakes sampled along the mean July temperature gradient, thereby fulfilling one of the most critical assumptions of weighted-averaging regression and calibration and its relative, weighted-averaging partial least-squares regression. A further 130 surface samples of comparable taphonomy, taxonomic detail and analyst became available as a result of other projects. These 130 samples, all from new lakes, were added to the 191-lake data set to create the 321-lake data set. The collection and construction of these data sets are outlined. Numerical analyses involving generalized linear modelling, constrained ordination techniques, weighted-averaging partial least-squares regression, and two different cross-validation procedures are used to asses the effects of increasing the size of the calibration data set from 191 to 321 lakes. The two data sets are used to reconstruct mean July temperature and mean annual precipitation for a Holocene site in northwest Norway and a Lateglacial site in west-central Norway. Overall, little is to be gained by increasing the modern data set beyond about 200 lakes in terms of modern model performance statistics, but the down-core reconstructions show less between-sample variability and are thus potentially more plausible and realistic when based on the 321-lake data set. [source]