Home About us Contact | |||
Leave-one-out Cross-validation (leave-one-out + cross-validation)
Selected AbstractsHierarchical spatial models for predicting pygmy rabbit distribution and relative abundanceJOURNAL OF APPLIED ECOLOGY, Issue 2 2010Tammy L. Wilson Summary 1.,Conservationists routinely use species distribution models to plan conservation, restoration and development actions, while ecologists use them to infer process from pattern. These models tend to work well for common or easily observable species, but are of limited utility for rare and cryptic species. This may be because honest accounting of known observation bias and spatial autocorrelation are rarely included, thereby limiting statistical inference of resulting distribution maps. 2.,We specified and implemented a spatially explicit Bayesian hierarchical model for a cryptic mammal species (pygmy rabbit Brachylagus idahoensis). Our approach used two levels of indirect sign that are naturally hierarchical (burrows and faecal pellets) to build a model that allows for inference on regression coefficients as well as spatially explicit model parameters. We also produced maps of rabbit distribution (occupied burrows) and relative abundance (number of burrows expected to be occupied by pygmy rabbits). The model demonstrated statistically rigorous spatial prediction by including spatial autocorrelation and measurement uncertainty. 3.,We demonstrated flexibility of our modelling framework by depicting probabilistic distribution predictions using different assumptions of pygmy rabbit habitat requirements. 4.,Spatial representations of the variance of posterior predictive distributions were obtained to evaluate heterogeneity in model fit across the spatial domain. Leave-one-out cross-validation was conducted to evaluate the overall model fit. 5.,Synthesis and applications. Our method draws on the strengths of previous work, thereby bridging and extending two active areas of ecological research: species distribution models and multi-state occupancy modelling. Our framework can be extended to encompass both larger extents and other species for which direct estimation of abundance is difficult. [source] Computational modeling of tetrahydroimidazo-[4,5,1-jk][1,4]-benzodiazepinone derivatives: An atomistic drug design approach using Kier-Hall electrotopological state (E-state) indicesJOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 11 2008Nitin S. Sapre Abstract Quantitative structure-activity relationships (QSAR), based on E-state indices have been developed for a series of tetrahydroimidazo-[4,5,1-jk]-benzodiazepinone derivatives against HIV-1 reverse transcriptase (HIV-1 RT). Statistical modeling using multiple linear regression technique in predicting the anti-HIV activity yielded a good correlation for the training set (R2 = 0.913, R = 0.897, Q2 = 0.849, MSE = 0.190, F -ratio = 59.97, PRESS = 18.05, SSE = 0.926, and p value = 0.00). Leave-one-out cross-validation also reaffirmed the predictions (R2 = 0.850, R = 0.824, Q2 = 0.849, MSE = 0.328, and PRESS = 18.05). The predictive ability of the training set was also cross-validated by a test set (R2 = 0.812, R = 0.799, Q2 = 0.765, MSE = 0.347, F -ratio = 64.69, PRESS = 7.37, SSE = 0.975, and p value = 0.00), which ascertained a satisfactory quality of fit. The results reflect the substitution pattern and suggest that the presence of a bulky and electropositive group in the five-member ring and electron withdrawing groups in the seven-member ring will have a positive impact on the antiviral activity of the derivatives. Bulky groups in the six-member ring do not show an activity-enhancing impact. Outlier analysis too reconfirms our findings. The E-state descriptors indicate their importance in quantifying the electronic characteristics of a molecule and thus can be used in chemical interpretation of electronic and steric factors affecting the biological activity of compounds. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2008 [source] Multivariate calibration of hyperspectral ,-ray energy spectra for proximal soil sensingEUROPEAN JOURNAL OF SOIL SCIENCE, Issue 1 2007R. A. Viscarra Rossel Summary The development of proximal soil sensors to collect fine-scale soil information for environmental monitoring, modelling and precision agriculture is vital. Conventional soil sampling and laboratory analyses are time-consuming and expensive. In this paper we look at the possibility of calibrating hyperspectral ,-ray energy spectra to predict various surface and subsurface soil properties. The spectra were collected with a proximal, on-the-go ,-ray spectrometer. We surveyed two geographically and physiographically different fields in New South Wales, Australia, and collected hyperspectral ,-ray data consisting of 256 energy bands at more than 20 000 sites in each field. Bootstrap aggregation with partial least squares regression (or bagging-PLSR) was used to calibrate the ,-ray spectra of each field for predictions of selected soil properties. However, significant amounts of pre-processing were necessary to expose the correlations between the ,-ray spectra and the soil data. We first filtered the spectra spatially using local kriging, then further de-noised, normalized and detrended them. The resulting bagging-PLSR models of each field were tested using leave-one-out cross-validation. Bagging-PLSR provided robust predictions of clay, coarse sand and Fe contents in the 0,15 cm soil layer and pH and coarse sand contents in the 15,50 cm soil layer. Furthermore, bagging-PLSR provided us with a measure of the uncertainty of predictions. This study is apparently the first to use a multivariate calibration technique with on-the-go proximal ,-ray spectrometry. Proximally sensed ,-ray spectrometry proved to be a useful tool for predicting soil properties in different soil landscapes. [source] Discrete dynamic Bayesian network analysis of fMRI dataHUMAN BRAIN MAPPING, Issue 1 2009John Burge Abstract We examine the efficacy of using discrete Dynamic Bayesian Networks (dDBNs), a data-driven modeling technique employed in machine learning, to identify functional correlations among neuroanatomical regions of interest. Unlike many neuroimaging analysis techniques, this method is not limited by linear and/or Gaussian noise assumptions. It achieves this by modeling the time series of neuroanatomical regions as discrete, as opposed to continuous, random variables with multinomial distributions. We demonstrated this method using an fMRI dataset collected from healthy and demented elderly subjects (Buckner, et al., 2000: J Cogn Neurosci 12:24-34) and identify correlates based on a diagnosis of dementia. The results are validated in three ways. First, the elicited correlates are shown to be robust over leave-one-out cross-validation and, via a Fourier bootstrapping method, that they were not likely due to random chance. Second, the dDBNs identified correlates that would be expected given the experimental paradigm. Third, the dDBN's ability to predict dementia is competitive with two commonly employed machine-learning classifiers: the support vector machine and the Gaussian naïve Bayesian network. We also verify that the dDBN selects correlates based on non-linear criteria. Finally, we provide a brief analysis of the correlates elicited from Buckner et al.'s data that suggests that demented elderly subjects have reduced involvement of entorhinal and occipital cortex and greater involvement of the parietal lobe and amygdala in brain activity compared with healthy elderly (as measured via functional correlations among BOLD measurements). Limitations and extensions to the dDBN method are discussed. Hum Brain Mapp, 2009. © 2007 Wiley-Liss, Inc. [source] New computational algorithm for the prediction of protein folding typesINTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 1 2001Nikola, tambuk Abstract We present a new computational algorithm for the prediction of a secondary protein structure. The method enables the evaluation of ,- and ,-protein folding types from the nucleotide sequences. The procedure is based on the reflected Gray code algorithm of nucleotide,amino acid relationships, and represents the extension of Swanson's procedure in Ref. 4. It is shown that six-digit binary notation of each codon enables the prediction of ,- and ,-protein folds by means of the error-correcting linear block triple-check code. We tested the validity of the method on the test set of 140 proteins (70 ,- and 70 ,-folds). The test set consisted of standard ,- and ,-protein classes from Jpred and SCOP databases, with nucleotide sequence available in the GenBank database. 100% accurate classification of ,- and ,-protein folds, based on 39 dipeptide addresses derived by the error-correcting coding procedure was obtained by means of the logistic regression analysis (p<0.00000001). Classification tree and machine learning sequential minimal optimization (SMO) classifier confirmed the results by means 97.1% and 90% accurate classification, respectively. Protein fold prediction quality tested by means of leave-one-out cross-validation was a satisfactory 82.1% for the logistic regression and 81.4% for the SMO classifier. The presented procedure of computational analysis can be helpful in detecting the type of protein folding from the newly sequenced exon regions. The method enables quick, simple, and accurate prediction of ,- and ,-protein folds from the nucleotide sequence on a personal computer. © 2001 John Wiley & Sons, Inc. Int J Quant Chem 84: 13,22, 2001 [source] Moving window as a variable selection method in potentiometric titration multivariate calibration and its application to the simultaneous determination of ions in Raschig synthesis mixturesJOURNAL OF CHEMOMETRICS, Issue 3 2009Sheng Fang Abstract A novel method based on moving window (MW) strategy has been proposed to simultaneously choose the optimal pH region and latent variables (LVs) number for partial least squares (PLS) regression in potentiometric titration multivariate calibration. In this method, the leave-one-out cross-validation with varying LVs number is run on different selected MW and, consequently, that revealing optimal results is selected. The method is applied to the simultaneous determination of H+, NH3OH+ and NH in Raschig synthesis mixtures, which is of industrial importance. A comparison in the modeling power of PLS is made between non-processed data set and data set processed by the MW method. Copyright © 2008 John Wiley & Sons, Ltd. [source] A systematic evaluation of the benefits and hazards of variable selection in latent variable regression.JOURNAL OF CHEMOMETRICS, Issue 7 2002Part II. Abstract Leave-multiple-out cross-validation (LMO-CV) is compared to leave-one-out cross-validation (LOO-CV) as objective function in variable selection for four real data sets. Two data sets stem from NIR spectroscopy and two from quantitative structure,activity relationships. In all four cases, LMO-CV outperforms LOO-CV with respect to prediction quality, model complexity (number of latent variables) and model size (number of variables). The number of objects left out in LMO-CV has an important effect on the final results. It controls both the number of latent variables in the final model and the prediction quality. The results of variable selection need to be validated carefully with a validation step that is independent of the variable selection. This step needs to be done because the internal figures of merit (i.e. anything that is derived from the objective function value) do not correlate well with the external predictivity of the selected models. This is most obvious for LOO-CV. LOO-CV without further constraints always shows the best internal figures of merit and the worst prediction quality. Copyright © 2002 John Wiley & Sons, Ltd. [source] A novel semi-empirical topological descriptor Nt and the application to study on QSPR/QSARJOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 15 2007Congyi Zhou Abstract A novel semi-empirical topological descriptor Nt was proposed by revising the traditional distance matrix based on the equilibrium electronegativity and the relative bond length. Nt can not only efficiently distinguish structures of organic compounds containing multiple bonds and/or heteroatoms, but also possess good applications of QSPR/QSAR (quantitative structure-property/activity relationships) to a large diverse set of compounds, which are alkanes, alkenes, alkynes, aldehydes, ketones, thiols, and alkoxy silicon chlorides with all the correlation coefficients of the models over 0.99. The LOO CV (leave-one-out cross-validation) method was used to testify the stability and predictive ability of the models. The validation results verify the good stability and predictive ability of the models employing the cross-validation parameters: RCV, SEPCV and SCV, which demonstrate the wide potential of the Nt descriptor for applications to QSPR/ QSAR. © 2007 Wiley Periodicals, Inc. J Comput Chem, 2007 [source] Proliferative alloresponse of T-cytotoxic cells identifies rejection-prone children with steroid-free liver transplantationLIVER TRANSPLANTATION, Issue 8 2009Chethan Ashokkumar Donor-induced and third-party,induced proliferation of T-helper and T-cytotoxic (Tc) cells and their naïve and memory subsets was evaluated simultaneously in single blood samples from 77 children who received steroid-free liver transplantation (LTx) after induction with rabbit anti-human thymocyte globulin. Proliferation was measured by dilution of the intravital dye carboxyfluorescein succinimidyl ester (CFSE) in a 3- to 4-day mixed lymphocyte response coculture. The ratio of donor/third-party,induced proliferated (CFSElow) T-cells was reported as the immunoreactivity index (IR) for each subset. Rejectors were defined as those who experienced biopsy-proven acute cellular rejection within 60 days of the assay. IR > 1 signified increased risk of rejection, and IR < 1 implied decreased risk. Demographics for 32 rejectors and 45 nonrejectors were similar. Proliferated CFSElow T-cells and subsets were significantly higher among rejectors compared with nonrejectors. In 33 of 77 randomly selected children, logistic regression, leave-one-out cross-validation, and receiver operating characteristic analyses showed that the IR of Tc cells was best associated with biopsy-proven rejection (sensitivity > 75%, specificity > 88%). Sensitivity and specificity were replicated in the remaining 44 children who composed the validation cohort. IR of CFSElow Tc cells correlated significantly with IR of proinflammatory, allospecific CD154+ Tc cells (r = 0.664, P = 0.0005) and inversely with IR of allospecific, anti-inflammatory, cytotoxic T lymphocyte antigen 4,positive Tc cells (r = ,0.630, P = 0.007). In conclusion, proliferative alloresponses of Tc cells can identify rejection-prone children receiving LTx. Liver Transpl 15:978,985, 2009. © 2009 AASLD. [source] Estimation of Compartmental Half-lives of Organic Compounds , Structural Similarity versus EPI-SuiteMOLECULAR INFORMATICS, Issue 4 2007Ralph Kühne Abstract A k Nearest Neighbors (KNN) approach is developed to extrapolate from existing semiquantitative compartmental half-lives of organic compounds to respective data for untested substances. It is based on the evaluation of structural similarity through atom-centered fragments (ACFs). For the model development and leave-one-out cross-validation, a set of 293 compounds with reference half-lives for the four compartments air, water, soil, and sediment was taken from literature. Comparative analysis of the model performance with results based on EPI-Suite predictions of degradation rates due to indirect photolysis, biodegradation, and hydrolysis demonstrates the superiority of the new approach to predict compartmental half-lives. The latter are needed as input information for modeling the multimedia fate and life-time of organic compounds. The discussion includes an analysis of the problems associated with converting process-specific loss rates into compartmental half-lives. [source] An Electrophilicity Based Analysis of Toxicity of Aromatic Compounds Towards Tetrahymena PyriformisMOLECULAR INFORMATICS, Issue 2 2006R. Roy Abstract Electrophilicity index is one of the important quantum chemical descriptors in describing toxicity or biological activities of the diverse classes of chemicals to bio-systems in the context of development of Quantitative Structure Activity Relationship (QSAR). In this study a large number of selected 174 aromatic compounds containing phenols, nitrobenzenes and benzonitriles are chosen as the training set to verify their toxic potency to Tetrahymena pyriformis in the light of electrophilicity. A systematic analysis has been made to find out the electron donation/acceptance nature of these model compounds by comparing their electronegativity values with those of the NA bases/DNA base pairs. The training sets are classified into two groups, viz., the electron donor group comprising 97 phenol derivatives and the electron acceptor group consisting of 77 nitrobenzenes and benzonitriles grouped together. Regression analysis in terms of correlation coefficient (), variance adjusted to degrees of freedom () and variance of leave-one-out cross-validation () has been made for both the electron donor and acceptor aromatic groups to predict the toxicity values of these model compounds to Tetrahymena pyriformis. It is heartening to note that the global and local electrophilicity indices along with the total Hartree-Fock energy can explain more than 80% of cross-validation variance of data of those aromatic molecules. [source] Predicting Anti-HIV-1 Activities of HEPT-analog Compounds by Using Support Vector ClassificationMOLECULAR INFORMATICS, Issue 9 2005Wencong Lu Abstract The support vector classification (SVC), as a novel approach, was employed to make a distinction within a class of non-nucleoside reverse transcriptase inhibitors. 1-[(2-hydroxyethoxy) methyl]-6-(phenyl thio)-thymine (HEPT) derivatives with high anti-HIV-1 activities and those with low anti-HIV-1 activities were compared on the basis of the following molecular descriptors: net atomic charge on atom 4, molecular volume, partition coefficient, molecular refractivity, molecular polarisability and molecular weight. By using the SVC, a mathematical model was constructed, which can predict the anti-HIV-1 activities of the HEPT-analogue compounds, with an accuracy of 100% as calculated on the basis of the leave-one-out cross-validation (LOOCV) test. The results indicate that the performance of the SVC model exceeds that of the stepwise discriminant analysis (SDA) model, for which a prediction accuracy of 94% was reported. [source] Allospecific CD154+ T Cells Associate with Rejection Risk After Pediatric Liver TransplantationAMERICAN JOURNAL OF TRANSPLANTATION, Issue 1 2009C. Ashokkumar Antigen-specific T cells, which express CD154 rapidly, but remain untested in alloimmunity, were measured with flow cytometry in 16-h MLR of 58 identically-immunosuppressed children with liver transplantation (LTx), to identify Rejectors (who had experienced biopsy-proven rejection within 60 days posttransplantation). Thirty-one children were sampled once, cross-sectionally. Twenty-seven children were sampled longitudinally, pre-LTx, and at 1,60 and 61,200 days after LTx. Results were correlated with proliferative alloresponses measured by CFSE-dye dilution (n = 23), and CTLA4, a negative T-cell costimulator, which antagonizes CD154-mediated effects (n = 31). In cross-sectional observations, logistic regression and leave-one-out cross-validation identified donor-specific, CD154 + T-cytotoxic (Tc)-memory cells as best associated with rejection outcomes. In the longitudinal cohort, (1) the association between CD154 + Tc-memory cells and rejection outcomes was replicated with sensitivity/specificity 92.3%/84.6% for observations at 1,60 days, and (2) elevated pre-LTx CD154 + Tc-memory cell responses were associated with significantly increased incidence (p = 0.02) and hazard (HR = 7.355) of rejection in survival/proportional hazard analysis. CD154 expression correlated with proliferative alloresponses (r = 0.835, p = 7.1e-07), and inversely with CTLA4 expression of allospecific CD154 + Tc-memory cells (r =,0.706, p = 3.0e-05). Allospecific CD154 + T-helper-memory cells, not CD154 + Tc-memory, were inhibited by increasing Tacrolimus concentrations (p = 0.026). Collectively, allospecific CD154 + T cells provide an estimate of rejection risk in children with LTx. [source] Plasma concentrations of VCAM-1 and PAI-1: A predictive biomarker for post-operative recurrence in colorectal cancerCANCER SCIENCE, Issue 8 2010Yasuhide Yamada This prospective study used antibody suspension bead arrays to identify biomarkers capable of predicting post-operative recurrence with distal metastasis in patients with colorectal cancer. One hundred colorectal cancer patients who underwent surgery were enrolled in this study. The median follow-up period was 3.9 years. The pre-operative plasma concentrations of 24 angiogenesis-related molecules were analyzed with regard to the TNM stage and the development of post-operative recurrence. The concentrations of half of the examined molecules (13/24) increased significantly according to the TNM stage (P < 0.05). Meanwhile, a multivariate logistic regression analysis revealed that the concentrations of vascular cell adhesion molecule 1 (VCAM-1) and plasminogen activator inhibitor-1 (PAI-1) were significantly higher in the post-operative recurrence group. The VCAM-1 and PAI-1 model discriminated post-operative recurrence with an area under the curve of 0.82, a sensitivity of 0.75, and a specificity of 0.73. A leave-one-out cross-validation was applied to the model to assess the prediction performance, and the result indicated that the cross-validated error rate was 12.5% (12/96). In conclusion, our results demonstrate that antibody suspension bead arrays are a powerful tool to screen biomarkers in the clinical setting, and the plasma levels of VCAM-1 and PAI-1 together may be a promising biomarker for predicting post-operative recurrence in patients with colorectal cancer. (Cancer Sci 2010) [source] |