Prediction Quality (prediction + quality)

Distribution by Scientific Domains


Selected Abstracts


New computational algorithm for the prediction of protein folding types

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 1 2001
Nikola, tambuk
Abstract We present a new computational algorithm for the prediction of a secondary protein structure. The method enables the evaluation of ,- and ,-protein folding types from the nucleotide sequences. The procedure is based on the reflected Gray code algorithm of nucleotide,amino acid relationships, and represents the extension of Swanson's procedure in Ref. 4. It is shown that six-digit binary notation of each codon enables the prediction of ,- and ,-protein folds by means of the error-correcting linear block triple-check code. We tested the validity of the method on the test set of 140 proteins (70 ,- and 70 ,-folds). The test set consisted of standard ,- and ,-protein classes from Jpred and SCOP databases, with nucleotide sequence available in the GenBank database. 100% accurate classification of ,- and ,-protein folds, based on 39 dipeptide addresses derived by the error-correcting coding procedure was obtained by means of the logistic regression analysis (p<0.00000001). Classification tree and machine learning sequential minimal optimization (SMO) classifier confirmed the results by means 97.1% and 90% accurate classification, respectively. Protein fold prediction quality tested by means of leave-one-out cross-validation was a satisfactory 82.1% for the logistic regression and 81.4% for the SMO classifier. The presented procedure of computational analysis can be helpful in detecting the type of protein folding from the newly sequenced exon regions. The method enables quick, simple, and accurate prediction of ,- and ,-protein folds from the nucleotide sequence on a personal computer. © 2001 John Wiley & Sons, Inc. Int J Quant Chem 84: 13,22, 2001 [source]


Application of SIC (simple interval calculation) for object status classification and outlier detection,comparison with regression approach

JOURNAL OF CHEMOMETRICS, Issue 9 2004
Oxana Ye.
Abstract We introduce a novel approach termed simple interval calculation (SIC) for classification of object status in linear multivariate calibration (MVC) and other data analytical contexts. SIC is a method that directly constructs an interval estimator for the predicted response. SIC is based on the single assumption that all errors involved in MVC are limited. We present the theory of the SIC method and explain its realization by linear programming techniques. The primary SIC consequence is a radically new object classification that can be interpreted using a two-dimensional object status plot (OSP), ,SIC residual vs SIC leverage'. These two new measures of prediction quality are introduced in the traditional chemometric MVC context. Simple straight demarcations divide the OSP into areas which quantitatively discriminate all objects involved in modeling and prediction into four different types: boundary samples, which are the significant objects (for generating the entire data structure) within the training subset; insiders, which are samples that comply with the model; outsiders, which are samples that have large prediction errors; and finally outliers, which are those samples that cannot be predicted at all with respect to a given model. We also present detailed comparisons of the new SIC approach with traditional chemometric methods applied for MVC, classification and outlier detection. These comparisons employ four real-world data sets, selected for their particular complexities, which serve as showcases of SIC application on intricate training and test set data structures. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Non-parametric statistical methods for multivariate calibration model selection and comparison,

JOURNAL OF CHEMOMETRICS, Issue 12 2003
Edward V. Thomas
Abstract Model selection is an important issue when constructing multivariate calibration models using methods based on latent variables (e.g. partial least squares regression and principal component regression). It is important to select an appropriate number of latent variables to build an accurate and precise calibration model. Inclusion of too few latent variables can result in a model that is inaccurate over the complete space of interest. Inclusion of too many latent variables can result in a model that produces noisy predictions through incorporation of low-order latent variables that have little or no predictive value. Commonly used metrics for selecting the number of latent variables are based on the predicted error sum of squares (PRESS) obtained via cross-validation. In this paper a new approach for selecting the number of latent variables is proposed. In this new approach the prediction errors of individual observations (obtained from cross-validation) are compared across models incorporating varying numbers of latent variables. Based on these comparisons, non-parametric statistical methods are used to select the simplest model (least number of latent variables) that provides prediction quality that is indistinguishable from that provided by more complex models. Unlike methods based on PRESS, this new approach is robust to the effects of anomalous observations. More generally, the same approach can be used to compare the performance of any models that are applied to the same data set where reference values are available. The proposed methodology is illustrated with an industrial example involving the prediction of gasoline octane numbers from near-infrared spectra. Published in 2004 by John Wiley & Sons, Ltd. [source]


A systematic evaluation of the benefits and hazards of variable selection in latent variable regression.

JOURNAL OF CHEMOMETRICS, Issue 7 2002
Part I. Search algorithm, simulations, theory
Abstract Variable selection is an extensively studied problem in chemometrics and in the area of quantitative structure,activity relationships (QSARs). Many search algorithms have been compared so far. Less well studied is the influence of different objective functions on the prediction quality of the selected models. This paper investigates the performance of different cross-validation techniques as objective function for variable selection in latent variable regression. The results are compared in terms of predictive ability, model size (number of variables) and model complexity (number of latent variables). It will be shown that leave-multiple-out cross-validation with a large percentage of data left out performs best. Since leave-multiple-out cross-validation is computationally expensive, a very efficient tabu search algorithm is introduced to lower the computational burden. The tabu search algorithm needs no user-defined operational parameters and optimizes the variable subset and the number of latent variables simultaneously. Copyright © 2002 John Wiley & Sons, Ltd. [source]


A systematic evaluation of the benefits and hazards of variable selection in latent variable regression.

JOURNAL OF CHEMOMETRICS, Issue 7 2002
Part II.
Abstract Leave-multiple-out cross-validation (LMO-CV) is compared to leave-one-out cross-validation (LOO-CV) as objective function in variable selection for four real data sets. Two data sets stem from NIR spectroscopy and two from quantitative structure,activity relationships. In all four cases, LMO-CV outperforms LOO-CV with respect to prediction quality, model complexity (number of latent variables) and model size (number of variables). The number of objects left out in LMO-CV has an important effect on the final results. It controls both the number of latent variables in the final model and the prediction quality. The results of variable selection need to be validated carefully with a validation step that is independent of the variable selection. This step needs to be done because the internal figures of merit (i.e. anything that is derived from the objective function value) do not correlate well with the external predictivity of the selected models. This is most obvious for LOO-CV. LOO-CV without further constraints always shows the best internal figures of merit and the worst prediction quality. Copyright © 2002 John Wiley & Sons, Ltd. [source]


Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 12 2008
Xuan Xiao
Abstract Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others. © 2008 Wiley Periodicals, Inc. J Comput Chem 2008. [source]


Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 4 2006
Xuan Xiao
Abstract The structural class is an important feature widely used to characterize the overall folding type of a protein. How to improve the prediction quality for protein structural classification by effectively incorporating the sequence-order effects is an important and challenging problem. Based on the concept of the pseudo amino acid composition [Chou, K. C. Proteins Struct Funct Genet 2001, 43, 246; Erratum: Proteins Struct Funct Genet 2001, 44, 60], a novel approach for measuring the complexity of a protein sequence was introduced. The advantage by incorporating the complexity measure factor into the pseudo amino acid composition as one of its components is that it can catch the essence of the overall sequence pattern of a protein and hence more effectively reflect its sequence-order effects. It was demonstrated thru the jackknife crossvalidation test that the overall success rate by the new approach was significantly higher than those by the others. It has not escaped our notice that the introduction of the complexity measure factor can also be used to improve the prediction quality for, among many other protein attributes, subcellular localization, enzyme family class, membrane protein type, and G-protein couple receptor type. © 2006 Wiley Periodicals, Inc. J Comput Chem 27: 478,482, 2006 [source]


Machine learning approaches for prediction of linear B-cell epitopes on proteins

JOURNAL OF MOLECULAR RECOGNITION, Issue 3 2006
Johannes Söllner
Abstract Identification and characterization of antigenic determinants on proteins has received considerable attention utilizing both, experimental as well as computational methods. For computational routines mostly structural as well as physicochemical parameters have been utilized for predicting the antigenic propensity of protein sites. However, the performance of computational routines has been low when compared to experimental alternatives. Here we describe the construction of machine learning based classifiers to enhance the prediction quality for identifying linear B-cell epitopes on proteins. Our approach combines several parameters previously associated with antigenicity, and includes novel parameters based on frequencies of amino acids and amino acid neighborhood propensities. We utilized machine learning algorithms for deriving antigenicity classification functions assigning antigenic propensities to each amino acid of a given protein sequence. We compared the prediction quality of the novel classifiers with respect to established routines for epitope scoring, and tested prediction accuracy on experimental data available for HIV proteins. The major finding is that machine learning classifiers clearly outperform the reference classification systems on the HIV epitope validation set. Copyright © 2006 John Wiley & Sons, Ltd. [source]


The Noise Prediction Model SATIN

PROCEEDINGS IN APPLIED MATHEMATICS & MECHANICS, Issue 1 2003
J. Ostertag Dipl.-Ing.
This paper presents the noise prediction model SATIN (Statistical Approach to Turbulence Induced Noise) which is based on Lighthill's acoustic analogy. It allows to predict both, the far-field noise radiation as well as near-field wall-pressure fluctuations. Far-field noise radiation may result from the scattering of wall-pressure fluctuations at geometrical discontinuities and is therefore important for many practical problems. Within this paper, we focus on the calculation of far-field noise radiation. The required input values of SATIN are local properties of turbulence, namely the turbulent kinetic energy and the integral length scale which can be obtained by steady solutions of the Reynolds-averaged Navier-Stokes equations with a two equation turbulence model. It is assumed that the turbulence is axisymmetric and homogenous, which is taken into account by introducing two anisotropy parameters. The validation of SATIN is done for trailing-edge noise originating from a thin flat plate using measurements of a phased array. As expected, the anisotropic formulation of SATIN improves the prediction quality considerably compared to isotropic turbulence. [source]


Assessing predictions of protein,protein interaction: The CAPRI experiment

PROTEIN SCIENCE, Issue 2 2005
Joël Janin
Abstract The Critical Assessment of PRedicted Interactions (CAPRI) experiment was designed in 2000 to test protein docking algorithms in blind predictions of the structure of protein,protein complexes. In four years, 17 complexes offered by crystallographers as targets prior to publication, have been subjected to structure prediction by docking their two components. Models of these complexes were submitted by predictor groups and assessed by comparing their geometry to the X-ray structure and by evaluating the quality of the prediction of the regions of interaction and of the pair wise residue contacts. Prediction was successful on 12 of the 17 targets, most of the failures being due to large conformation changes that the algorithms could not cope with. Progress in the prediction quality observed in four years indicates that the experiment is a powerful incentive to develop new procedures that allow for flexibility during docking and incorporate nonstructural information. We therefore call upon structural biologists who study protein,protein complexes to provide targets for further rounds of CAPRI predictions. [source]