Test Set (test + set)

Distribution by Scientific Domains
Distribution within Chemistry

Kinds of Test Set

  • external test set
  • independent test set
  • separate test set


  • Selected Abstracts


    Peripheral blood MDS score: A new flow cytometric tool for the diagnosis of myelodysplastic syndromes,

    CYTOMETRY, Issue 1 2005
    Sindhu Cherian
    Abstract Background Myelodysplastic syndromes (MDS) are a heterogeneous group of hematopoietic disorders diagnosed using morphologic and clinical findings supported by cytogenetics. Because abnormalities may be subtle, diagnosis using these approaches can be challenging. Flow cytometric (FCM) approaches have been described; however the value of bone marrow immunophenotyping in MDS remains unclear due to the variability in detected abnormalities. We sought to refine the FCM approach by using peripheral blood (PB) to create a clinically useful tool for the diagnosis of MDS. Methods PB from 15 patients with MDS was analyzed by multiparametric flow cytometry using an extensive panel of monoclonal antibodies. Patterns of neutrophil antigen expression were compared with those of normal controls (n = 16) to establish light scatter and/or immunophenotypic abnormalities that correlated with MDS. A scoring algorithm was developed and validated prospectively on a blinded patient set. Results PB neutrophils from patients with MDS had lower side scatter and higher expression of CD66 and CD11a than did controls. Some MDS PB neutrophils demonstrated abnormal CD116 and CD10 expression. Because none of these abnormalities proved consistently diagnostic, we sought to increase the power of the assay by devising a scoring system to allow the association of multiple abnormalities and account for phenotypic variations. The PB MDS score differentiated patients with MDS from controls (P < 0.0001) in the test set. In a prospective validation, the PB MDS score successfully identified patients with MDS (sensitivity 73%, specificity 90%). Conclusions FCM analysis of side scatter and only four additional immunophenotypic parameters of PB neutrophils using the PB MDS score proved more sensitive than standard laboratory approaches and may provide an additional, more reliable diagnostic tool in the identification of MDS. © 2005 Wiley-Liss, Inc. [source]


    An automated, sheathless capillary electrophoresis-mass spectrometry platform for discovery of biomarkers in human serum

    ELECTROPHORESIS, Issue 7-8 2005
    Alexander P. Sassi
    Abstract A capillary electrophoresis-mass spectrometry (CE-MS) method has been developed to perform routine, automated analysis of low-molecular-weight peptides in human serum. The method incorporates transient isotachophoresis for in-line preconcentration and a sheathless electrospray interface. To evaluate the performance of the method and demonstrate the utility of the approach, an experiment was designed in which peptides were added to sera from individuals at each of two different concentrations, artificially creating two groups of samples. The CE-MS data from the serum samples were divided into separate training and test sets. A pattern-recognition/feature-selection algorithm based on support vector machines was used to select the mass-to-charge (m/z) values from the training set data that distinguished the two groups of samples from each other. The added peptides were identified correctly as the distinguishing features, and pattern recognition based on these peptides was used to assign each sample in the independent test set to its respective group. A twofold difference in peptide concentration could be detected with statistical significance (p -value < 0.0001). The accuracy of the assignment was 95%, demonstrating the utility of this technique for the discovery of patterns of biomarkers in serum. [source]


    Prediction of biodegradation from the atom-type electrotopological state indices

    ENVIRONMENTAL TOXICOLOGY & CHEMISTRY, Issue 10 2001
    Jarmo Huuskonen
    Abstract A group contribution method based on atom-type electrotopological state indices for predicting the biodegradation of a diverse set of 241 organic chemicals is presented. Multiple linear regression and artificial neural networks were used to build the models using a training set of 172 compounds, for which the approximate time for ultimate biodegradation was estimated from the results of a survey of an expert panel. Derived models were validated by using a leave-25%-out method and against two test sets of 12 and 57 chemicals not included in the training set. The squared correlation coefficient (r2) for a linear model with 15 structural parameters was 0.76 for the training set and 0.68 for the test set of 12 molecules. The model predicted correctly the biodegradation of 48 chemicals in the test set of 57 molecules, for which biodegradability was presented as rapid or slow. The use of artificial neural networks gave better prediction for both test sets when the same set of parameters was tested as inputs in neural network simulations. The predictions of rapidly biodegradable chemicals were more accurate than the predictions of slowly bio-degradable chemicals for both the regression and neural network models. [source]


    Training evaluation of a course in diabetic retinopathy screening

    EUROPEAN DIABETES NURSING, Issue 2 2005
    R Pauli PhD Senior Lecturer
    Abstract The success and effectiveness of diabetic screening programmes are dependent on the availability of appropriately trained image graders. This study was designed to evaluate graders enrolled on a locally devised, formal training course by means of a performance-based measure. The course consisted of four days of classroom-based tuition followed by three months of practice-based learning in the workplace. The aim was to establish whether trainees showed an improvement in their ability to grade images, and secondly whether test sets of images are useful in measuring training outcome. Thirteen trainees were required to grade a test set of 24 single images both before and after training. A significant improvement in sensitivity (from 35% before training to 45% after training) was observed as a result of training but at a cost of a decline in specificity. Trainees' confidence ratings measured on a five-point scale increased from an average of 2.4 to 4.1 (p<0.01). We concluded that the course needs to focus more on trainees' ability to discriminate between normal and abnormal images as well as improving grading accuracy in line with increased grading confidence. Test-based course evaluation can be seen to be a valuable instrument in establishing a quality standard for stated learning outcomes. In this research it has clearly indicated weaknesses of the training programme in its current form. Copyright © 2005 FEND. [source]


    Stepwise geographical traceability of virgin olive oils by chemical profiles using artificial neural network models

    EUROPEAN JOURNAL OF LIPID SCIENCE AND TECHNOLOGY, Issue 10 2009
    Diego L. García-González
    Abstract The geographical traceability of virgin olive oils implies the use of analytical methods that allow the identification of the origin of the oil and the authentication of the information boasted on the labels. In this work, the geographical identification of the virgin olive oils has been addressed by complete chemical characterisation of samples (64 compounds analysed by GC and HPLC) and the design of artificial neural network (ANN) models for each one of the levels of a proposed classification scheme. A high number of samples (687) from Spain, Italy and Portugal served as training and test sets for the ANN models. The highest classification level, focused on the grouping of samples by country, was achieved through analysis of fatty acids, with 99.9% of samples classified. Other levels (region, province, Protected Designations of Origin or PDO) were focused on Spanish oils and required additional series of compounds (sterols, alcohols, hydrocarbons) as well as the fatty acids to obtain classification rates higher than 90%. The classification of oils into different PDOs , the last and most difficult level of classification , showed the highest root mean square errors. The classification percentages, however, were still higher than 90% in the test set, which proves the application of the traceability methodology for a chemical verification of PDO claims. [source]


    A rapid screening test to distinguish between Candida albicans and Candida dubliniensis using NMR spectroscopy

    FEMS MICROBIOLOGY LETTERS, Issue 2 2005
    Uwe Himmelreich
    Abstract Nuclear magnetic resonance (NMR) spectroscopy combined with a statistical classification strategy (SCS) successfully distinguished between Candida albicans and Candida dubliniensis. 96% of the isolates from an independent test set were identified correctly. This proves that this rapid approach is a valuable method for the identification and chemotaxonomic characterisation of closely related taxa. Most discriminatory regions were correlated with metabolite profiles, indicating biochemical differences between the two species. [source]


    Upper digestive bleeding in cirrhosis.

    HEPATOLOGY, Issue 3 2003
    Post-therapeutic outcome, prognostic indicators
    Several treatments have been proven to be effective for variceal bleeding in patients with cirrhosis. The aim of this multicenter, prospective, cohort study was to assess how these treatments are used in clinical practice and what are the posttherapeutic prognosis and prognostic indicators of upper digestive bleeding in patients with cirrhosis. A training set of 291 and a test set of 174 bleeding cirrhotic patients were included. Treatment was according to the preferences of each center and the follow-up period was 6 weeks. Predictive rules for 5-day failure (uncontrolled bleeding, rebleeding, or death) and 6-week mortality were developed by the logistic model in the training set and validated in the test set. Initial treatment controlled bleeding in 90% of patients, including vasoactive drugs in 27%, endoscopic therapy in 10%, combined (endoscopic and vasoactive) in 45%, balloon tamponade alone in 1%, and none in 17%. The 5-day failure rate was 13%, 6-week rebleeding was 17%, and mortality was 20%. Corresponding findings for variceal versus nonvariceal bleeding were 15% versus 7% (P = .034), 19% versus 10% (P = .019), and 20% versus 15% (P = .22). Active bleeding on endoscopy, hematocrit levels, aminotransferase levels, Child-Pugh class, and portal vein thrombosis were significant predictors of 5-day failure; alcohol-induced etiology, bilirubin, albumin, encephalopathy, and hepatocarcinoma were predictors of 6-week mortality. Prognostic reassessment including blood transfusions improved the predictive accuracy. All the developed prognostic models were superior to the Child-Pugh score. In conclusion, prognosis of digestive bleeding in cirrhosis has much improved over the past 2 decades. Initial treatment stops bleeding in 90% of patients. Accurate predictive rules are provided for early recognition of high-risk patients. [source]


    Frequent inactivation of SPARC by promoter hypermethylation in colon cancers

    INTERNATIONAL JOURNAL OF CANCER, Issue 3 2007
    Eungi Yang
    Abstract Epigenetic modification of gene expression plays an important role in the development of human cancers. The inactivation of SPARC through CpG island methylation was studied in colon cancers using oligonucleotide microarray analysis and methylation specific PCR (MSP). Gene expression of 7 colon cancer cell lines was evaluated before and after treatment with the demethylating agent 5-aza-2,-deoxycytidine (5Aza-dC) by oligonucleotide microarray analysis. Expression of SPARC was further examined in colon cancer cell lines and primary colorectal cancers, and the methylation status of the SPARC promoter was determined by MSP. SPARC expression was undetectable in 5 of 7 (71%) colorectal cancer cell lines. Induction of SPARC was demonstrated after treatment with the demethylating agent 5Aza-dC in 5 of the 7 cell lines. We examined the methylation status of the CpG island of SPARC in 7 colon cancer cell lines and in 20 test set of colon cancer tissues. MSP demonstrated hypermethylation of the CpG island of SPARC in 6 of 7 cell lines and in all 20 primary colon cancers, when compared with only 3 of 20 normal colon mucosa. Immunohistochemical analysis showed that SPARC expression was downregulated or absent in 17 of 20 colon cancers. A survival analysis of 292 validation set of colorectal carcinoma patients revealed a poorer prognosis for patients lacking SPARC expression than for patients with normal SPARC expression (56.79% vs. 75.83% 5-year survival rate, p = 0.0014). The results indicate that epigenetic gene silencing of SPARC is frequent in colon cancers, and that inactivation of SPARC is related to rapid progression of colon cancers. © 2007 Wiley-Liss, Inc. [source]


    Urinary biomarker profiling in transitional cell carcinoma

    INTERNATIONAL JOURNAL OF CANCER, Issue 11 2006
    Nicholas P. Munro
    Abstract Urinary biomarkers or profiles that allow noninvasive detection of recurrent transitional cell carcinoma (TCC) of the bladder are urgently needed. We obtained duplicate proteomic (SELDI) profiles from 227 subjects (118 TCC, 77 healthy controls and 32 controls with benign urological conditions) and used linear mixed effects models to identify peaks that are differentially expressed between TCC and controls and within TCC subgroups. A Random Forest classifier was trained on 130 profiles to develop an algorithm to predict the presence of TCC in a randomly selected initial test set (n = 54) and an independent validation set (n = 43) several months later. Twenty two peaks were differentially expressed between all TCC and controls (p < 10,7). However potential confounding effects of age, sex and analytical run were identified. In an age-matched sub-set, 23 peaks were differentially expressed between TCC and combined benign and healthy controls at the 0.005 significance level. Using the Random Forest classifier, TCC was predicted with 71.7% sensitivity and 62.5% specificity in the initial set and with 78.3% sensitivity and 65.0% specificity in the validation set after 6 months, compared with controls. Several peaks of importance were also identified in the linear mixed effects model. We conclude that SELDI profiling of urine samples can identify patients with TCC with comparable sensitivities and specificities to current tumor marker tests. This is the first time that reproducibility has been demonstrated on an independent test set analyzed several months later. Identification of the relevant peaks may facilitate multiplex marker assay development for detection of recurrent disease. © 2006 Wiley-Liss, Inc. [source]


    Optimization of strong and weak coordinates

    INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 12 2006
    Marcel Swart
    Abstract We present a new scheme for the geometry optimization of equilibrium and transition state structures that can be used for both strong and weak coordinates. We use a screening function that depends on atom-pair distances to differentiate strong coordinates from weak coordinates. This differentiation significantly accelerates the optimization of these coordinates, and thus of the overall geometry. An adapted version of the delocalized coordinates setup is used to generate automatically a set of internal coordinates that is shown to perform well for the geometry optimization of systems with weak and strong coordinates. For the Baker test set of 30 molecules, we need only 173 geometry cycles with PW91/TZ2P calculations, which compares well with the best previous attempts reported in literature. For the localization of transition state structures, we generate the initial Hessian matrix, using appropriate force constants from a database. In this way, one avoids the explicit computation of the Hessian matrix. © 2006 Wiley Periodicals, Inc. Int J Quantum Chem, 2006 [source]


    New computational algorithm for the prediction of protein folding types

    INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 1 2001
    Nikola, tambuk
    Abstract We present a new computational algorithm for the prediction of a secondary protein structure. The method enables the evaluation of ,- and ,-protein folding types from the nucleotide sequences. The procedure is based on the reflected Gray code algorithm of nucleotide,amino acid relationships, and represents the extension of Swanson's procedure in Ref. 4. It is shown that six-digit binary notation of each codon enables the prediction of ,- and ,-protein folds by means of the error-correcting linear block triple-check code. We tested the validity of the method on the test set of 140 proteins (70 ,- and 70 ,-folds). The test set consisted of standard ,- and ,-protein classes from Jpred and SCOP databases, with nucleotide sequence available in the GenBank database. 100% accurate classification of ,- and ,-protein folds, based on 39 dipeptide addresses derived by the error-correcting coding procedure was obtained by means of the logistic regression analysis (p<0.00000001). Classification tree and machine learning sequential minimal optimization (SMO) classifier confirmed the results by means 97.1% and 90% accurate classification, respectively. Protein fold prediction quality tested by means of leave-one-out cross-validation was a satisfactory 82.1% for the logistic regression and 81.4% for the SMO classifier. The presented procedure of computational analysis can be helpful in detecting the type of protein folding from the newly sequenced exon regions. The method enables quick, simple, and accurate prediction of ,- and ,-protein folds from the nucleotide sequence on a personal computer. © 2001 John Wiley & Sons, Inc. Int J Quant Chem 84: 13,22, 2001 [source]


    Determination of the molecular weight of proteins in solution from a single small-angle X-ray scattering measurement on a relative scale

    JOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 1 2010
    H. Fischer
    This paper describes a new and simple method to determine the molecular weight of proteins in dilute solution, with an error smaller than ,10%, by using the experimental data of a single small-angle X-ray scattering (SAXS) curve measured on a relative scale. This procedure does not require the measurement of SAXS intensity on an absolute scale and does not involve a comparison with another SAXS curve determined from a known standard protein. The proposed procedure can be applied to monodisperse systems of proteins in dilute solution, either in monomeric or multimeric state, and it has been successfully tested on SAXS data experimentally determined for proteins with known molecular weights. It is shown here that the molecular weights determined by this procedure deviate from the known values by less than 10% in each case and the average error for the test set of 21 proteins was 5.3%. Importantly, this method allows for an unambiguous determination of the multimeric state of proteins with known molecular weights. [source]


    Principles of Proper Validation: use and abuse of re-sampling for validation

    JOURNAL OF CHEMOMETRICS, Issue 3-4 2010
    Kim H. Esbensen
    Abstract Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences,which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all ,future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t,u plots for optimal understanding of the X,Y interrelationships and for validation guidance. QSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Predicting % of crystallinity in FCC catalysts by FT-MIR and PLS

    JOURNAL OF CHEMOMETRICS, Issue 11-12 2008
    Angel Dago
    Abstract This paper describes an analytical procedure for prediction of percent of crystallinity of fluidized catalytic cracking catalysts (FCC) using Fourier transform mid infrared spectroscopy (FT-MIR) and partial least-squares (PLS) multivariate calibration technique. In order to make a robust regression model, multiplicative scatter correction (MSC) and smoothed second derivative pre-processing methods were tested. Root mean squared error of prediction (RMSEP) of an independent test set was used to measure the performance of the models. The comparison shows that reasonable values of RMSEP and RMSECV were obtained for PLS-MSC model (RMSEP,=,0.8% and RMSECV,=,1.3%). The accuracy of the results obtained by the PLS-MSC regression model is in accordance with the uncertainty of the XRPD reference method. The developed method can be implemented in a refinery laboratory environment with ease. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    Theoretical study of the microhydration of mononuclear and dinuclear uranium(VI) species derived from solvolysis of uranyl nitrate in water

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 12 2010
    Milan On
    Abstract The structures and energetics of mononuclear and dinuclear uranium species formed upon speciation of uranyl(VI) nitrate, UO2(NO3)2, in water are investigated by quantum chemistry using density functional theory and the wavefunction-based methods (MP2, CCSD, CCSD(T)). We provide a discussion of the basic coordination patterns of the various mono- and dinuclear uranyl compounds [(UO2)m(X,Y)2m,1(H2O)n]+ (m = 1, 2; n = 0,4) found in a recent mass spectrometric study (Tsierkezos et al., Inorg Chem 2009, 48, 6287). The energetics of the complexation of the uranyl dication to the counterions OH, and NO3, as well as the degradation of the dinuclear species were studied by reference to a test set of 16 representative molecules with the MP2 method and the B3LYP, M06, M06-HF, and M06-2X DFT functionals. All DFT functionals provide structures and energetics close to MP2 results, with M06 family being slightly superior to the standard B3LYP functional. © 2010 Wiley Periodicals, Inc. J Comput Chem 2010 [source]


    Trends of the bonding effect on the performance of DFT methods in electric properties calculations: A pattern recognition and metric space approach on some XY2 (X = O, S and Y = H, O, F, S, Cl) molecules

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 2 2010
    Christos Christodouleas
    Abstract A test set of 10 molecules (open and ring forms of ozone and sulfur dioxide as well as water and hydrogen sulfide and their respective fluoro- and chloro-substituted analogs) of specific atmospheric interest has been formed as to assess the performance of various density functional theory methods in (hyper)polarizability calculations against well-established ab initio methods. The choice of these molecules was further based on (i) the profound change in the physics between isomeric systems, e.g., open (C2v) and ring (D3h) forms of ozone, (ii) the relation between isomeric forms, e.g., open and ring form of sulfur dioxide (both of C2v symmetry), and (iii) the effect of the substitution, e.g., in fluoro- and chloro-substituted water analogs. The analysis is aided by arguments chosen from the information theory, graph theory, and pattern recognition fields of Mathematics: In brief, a multidimensional space is formed by the methods which are playing the role of vectors with the independent components of the electric properties to act as the coordinates of these vectors, hence the relation between different vectors (e.g., methods) can be quantified by a proximity measure. Results are in agreement with previous studies revealing the acceptable and consistent behavior of the mPW1PW91, B3P86, and PBE0 methods. It is worth noting the remarkable good performance of the double hybrid functionals (namely: B2PLYP and mPW2PLYP) which are for the first time used in calculations of electric response properties. © 2009 Wiley Periodicals, Inc. J Comput Chem 2010 [source]


    Multiple classifier integration for the prediction of protein structural classes

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 14 2009
    Lei Chen
    Abstract Supervised classifiers, such as artificial neural network, partition trees, and support vector machines, are often used for the prediction and analysis of biological data. However, choosing an appropriate classifier is not straightforward because each classifier has its own strengths and weaknesses, and each biological dataset has its own characteristics. By integrating many classifiers together, people can avoid the dilemma of choosing an individual classifier out of many to achieve an optimized classification results (Rahman et al., Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variation, Springer, Berlin, 2002, 167,178). The classification algorithms come from Weka (Witten and Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2005) (a collection of software tools for machine learning algorithms). By integrating many predictors (classifiers) together through simple voting, the correct prediction (classification) rates are 65.21% and 65.63% for a basic training dataset and an independent test set, respectively. These results are better than any single machine learning algorithm collected in Weka when exactly the same data are used. Furthermore, we introduce an integration strategy which takes care of both classifier weightings and classifier redundancy. A feature selection strategy, called minimum redundancy maximum relevance (mRMR), is transferred into algorithm selection to deal with classifier redundancy in this research, and the weightings are based on the performance of each classifier. The best classification results are obtained when 11 algorithms are selected by mRMR method, and integrated together through majority votes with weightings. As a result, the prediction correct rates are 68.56% and 69.29% for the basic training dataset and the independent test dataset, respectively. The web-server is available at http://chemdata.shu.edu.cn/protein_st/. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009 [source]


    Conceptual DFT properties-based 3D QSAR: Analysis of inhibitors of the nicotine metabolizing CYP2A6 enzyme

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 12 2009
    Sofie Van Damme
    Abstract Structure-activity relationships of 46 P450 2A6 inhibitors were analyzed using the 3D-QSAR methodology. The analysis was carried out to confront the use of traditional steric and electrostatic fields with that of a number of fields reflecting conceptual DFT properties: electron density, HOMO, LUMO, and Fukui f, function as 3D fields. The most predictive models were obtained by combining the information of the electron density with the Fukui f, function (r2 = 0.82, q2 = 0.72), yielding a statistically significant and predictive model. The generated model was able to predict the inhibition potencies of an external test set of five chemicals. The result of the analysis indicates that conceptual DFT-based molecular fields can be useful as 3D QSAR molecular interaction fields. © 2008 Wiley Periodicals, Inc. J Comput Chem 2009 [source]


    H-bond donor strength;

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 9 2009
    Abraham parameter;
    A quantum chemical model is introduced to predict the H-bond donor strength of monofunctional organic compounds from their ground-state electronic properties. The model covers OH, NH, and CH as H-bond donor sites and was calibrated with experimental values for the Abraham H-bond donor strength parameter A using the ab initio and density functional theory levels HF/6-31G** and B3LYP/6-31G**. Starting with the Morokuma analysis of hydrogen bonding, the electrostatic (ES), polarizability (PL), and charge transfer (CT) components were quantified employing local molecular parameters. With hydrogen net atomic charges calculated from both natural population analysis and the ES potential scheme, the ES term turned out to provide only marginal contributions to the Abraham parameter A, except for weak hydrogen bonds associated with acidic CH sites. Accordingly, A is governed by PL and CT contributions. The PL component was characterized through a new measure of the local molecular hardness at hydrogen, ,(H), which in turn was quantified through empirically defined site-specific effective donor and acceptor energies, EEocc and EEvac. The latter parameter was also used to address the CT contribution to A. With an initial training set of 77 compounds, HF/6-31G** yielded a squared correlation coefficient, r2, of 0.91. Essentially identical statistics were achieved for a separate test set of 429 compounds and for the recalibrated model when using all 506 compounds. B3LYP/6-31G** yielded slightly inferior statistics. The discussion includes subset statistics for compounds containing OH, NH, and active CH sites and a nonlinear model extension with slightly improved statistics (r2 = 0.92). © 2008 Wiley Periodicals, Inc. J Comput Chem 2009 [source]


    Computational modeling of tetrahydroimidazo-[4,5,1-jk][1,4]-benzodiazepinone derivatives: An atomistic drug design approach using Kier-Hall electrotopological state (E-state) indices

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 11 2008
    Nitin S. Sapre
    Abstract Quantitative structure-activity relationships (QSAR), based on E-state indices have been developed for a series of tetrahydroimidazo-[4,5,1-jk]-benzodiazepinone derivatives against HIV-1 reverse transcriptase (HIV-1 RT). Statistical modeling using multiple linear regression technique in predicting the anti-HIV activity yielded a good correlation for the training set (R2 = 0.913, R = 0.897, Q2 = 0.849, MSE = 0.190, F -ratio = 59.97, PRESS = 18.05, SSE = 0.926, and p value = 0.00). Leave-one-out cross-validation also reaffirmed the predictions (R2 = 0.850, R = 0.824, Q2 = 0.849, MSE = 0.328, and PRESS = 18.05). The predictive ability of the training set was also cross-validated by a test set (R2 = 0.812, R = 0.799, Q2 = 0.765, MSE = 0.347, F -ratio = 64.69, PRESS = 7.37, SSE = 0.975, and p value = 0.00), which ascertained a satisfactory quality of fit. The results reflect the substitution pattern and suggest that the presence of a bulky and electropositive group in the five-member ring and electron withdrawing groups in the seven-member ring will have a positive impact on the antiviral activity of the derivatives. Bulky groups in the six-member ring do not show an activity-enhancing impact. Outlier analysis too reconfirms our findings. The E-state descriptors indicate their importance in quantifying the electronic characteristics of a molecule and thus can be used in chemical interpretation of electronic and steric factors affecting the biological activity of compounds. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2008 [source]


    On the performance of some aromaticity indices: A critical assessment using a test set

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 10 2008
    Ferran Feixas
    Abstract Aromaticity is a central chemical concept widely used in modern chemistry for the interpretation of molecular structure, stability, reactivity, and magnetic properties of many compounds. As such, its reliable prediction is an important task of computational chemistry. In recent years, many methods to quantify aromaticity based on different physicochemical properties of molecules have been proposed. However, the nonobservable nature of aromaticity makes difficult to assess the performance of the numerous existing indices. In the present work, we introduce a series of fifteen aromaticity tests that can be used to analyze the advantages and drawbacks of a group of aromaticity descriptors. On the basis of the results obtained for a set of ten indicators of aromaticity, we conclude that indices based on the study of electron delocalization in aromatic species are the most accurate among those examined in this work. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2008 [source]


    An improved algorithm for analytical gradient evaluation in resolution-of-the-identity second-order Møller-Plesset perturbation theory: Application to alanine tetrapeptide conformational analysis

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 5 2007
    Robert A. Distasio JR.
    Abstract We present a new algorithm for analytical gradient evaluation in resolution-of-the-identity second-order Møller-Plesset perturbation theory (RI-MP2) and thoroughly assess its computational performance and chemical accuracy. This algorithm addresses the potential I/O bottlenecks associated with disk-based storage and access of the RI-MP2 t -amplitudes by utilizing a semi-direct batching approach and yields computational speed-ups of approximately 2,3 over the best conventional MP2 analytical gradient algorithms. In addition, we attempt to provide a straightforward guide to performing reliable and cost-efficient geometry optimizations at the RI-MP2 level of theory. By computing relative atomization energies for the G3/99 set and optimizing a test set of 136 equilibrium molecular structures, we demonstrate that satisfactory relative accuracy and significant computational savings can be obtained using Pople-style atomic orbital basis sets with the existing auxiliary basis expansions for RI-MP2 computations. We also show that RI-MP2 geometry optimizations reproduce molecular equilibrium structures with no significant deviations (>0.1 pm) from the predictions of conventional MP2 theory. As a chemical application, we computed the extended-globular conformational energy gap in alanine tetrapeptide at the extrapolated RI-MP2/cc-pV(TQ)Z level as 2.884, 4.414, and 4.994 kcal/mol for structures optimized using the HF, DFT (B3LYP), and RI-MP2 methodologies and the cc-pVTZ basis set, respectively. These marked energetic discrepancies originate from differential intramolecular hydrogen bonding present in the globular conformation optimized at these levels of theory and clearly demonstrate the importance of long-range correlation effects in polypeptide conformational analysis. © 2007 Wiley Periodicals, Inc. J Comput Chem, 2007 [source]


    Modified calculations of hydrocarbon thermodynamic properties

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 5 2006
    Min Hsien Liu
    Abstract A test set of 65 hydrocarbons was examined to elucidate theoretically their thermodynamic properties by performing the density-functional theory (DFT) and ab initio calculations. All the calculated data were modified using a three-parameter calibration equation and the least-squares approach, to determine accurately enthalpies of formation (,Hf), entropies (S), and heat capacities (Cp). Calculation results demonstrated that the atomization energies of all compounds exhibited an average absolute relative error ranging between 0.11, 0.13%, and an ,Hf of formation with a mean absolute absolute error (M.|A.E.|) ranging from only 5.7,6.8 kJ/mol (1.3,1.6 kcal/mol) (i.e., those results correlated with those of Dr. Herndon's 1.1 kcal/mol). Additionally, the entropy ranged from 3.5,4.2 J/mol K (0.8,1.0 cal/mol K) M.|A.E.|; a heat capacity between 2.3,2.9 J/mol K (0.5,0.7 cal/mol K) M.|A.E.| was obtained as well. © 2006 Wiley Periodicals, Inc. J Comput Chem 27: 537,544, 2006 [source]


    Accurate prediction of proton chemical shifts.

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 16 2001

    Abstract Forty-five proton chemical shifts in 14 aromatic molecules have been calculated at several levels of theory: Hartree,Fock and density functional theory with several different basis sets, and also second-order Møller,Plesset (MP2) theory. To obtain consistent experimental data, the NMR spectra were remeasured on a 500 MHz spectrometer in CDCl3 solution. A set of 10 molecules without strong electron correlation effects was selected as the parametrization set. The calculated chemical shifts (relative to benzene) of 29 different protons in this set correlate very well with the experiment, and even better after linear regression. For this set, all methods perform roughly equally. The best agreement without linear regression is given by the B3LYP/TZVP method (rms deviation 0.060 ppm), although the best linear fit of the calculated shifts to experimental values is obtained for B3LYP/6-311++G**, with an rms deviation of only 0.037 ppm. Somewhat larger deviations were obtained for the second test set of 4 more difficult molecules: nitrobenzene, azulene, salicylaldehyde, and o -nitroaniline, characterized by strong electron correlation or resonance-assisted intramolecular hydrogen bonding. The results show that it is possible, at a reasonable cost, to calculate relative proton shieldings in a similar chemical environment to high accuracy. Our ultimate goal is to use calculated proton shifts to obtain constraints for local conformations in proteins; this requires a predictive accuracy of 0.1,0.2 ppm. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 1887,1895, 2001 [source]


    New energy terms for reduced protein models implemented in an off-lattice force field

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 12 2001
    Tommi Hassinen
    Abstract Parameterization and test calculations of a reduced protein model with new energy terms are presented. The new energy terms retain the steric properties and the most significant degrees of freedom of protein side chains in an efficient way using only one to three virtual atoms per amino acid residue. The energy terms are implemented in a force field containing predefined secondary structure elements as constraints, electrostatic interaction terms, and a solvent-accessible surface area term to include the effect of solvation. In the force field the main-chain peptide units are modeled as electric dipoles, which have constant directions in ,-helices and ,-sheets and variable conformation-dependent directions in loops. Protein secondary structures can be readily modeled using these dipole terms. Parameters of the force field were derived using a large set of experimental protein structures and refined by minimizing RMS errors between the experimental structures and structures generated using molecular dynamics simulations. The final average RMS error was 3.7 Å for the main-chain virtual atoms (C, atoms) and 4.2 Å for all virtual atoms for a test set of 10 proteins with 58,294 amino acid residues. The force field was further tested with a substantially larger test set of 608 proteins yielding somewhat lower accuracy. The fold recognition capabilities of the force field were also evaluated using a set of 27,814 misfolded decoy structures. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 1229,1242, 2001 [source]


    Prediction of flammability characteristics of pure hydrocarbons from molecular structures

    AICHE JOURNAL, Issue 3 2010
    Yong Pan
    Abstract A quantitative structure-property relationship study is performed to develop mathematical models for predicting the flammability characteristics of pure hydrocarbons. The molecular structures of the compounds are numerically represented by various kinds of molecular descriptors. Genetic algorithm based multiple linear regression is used to select most statistically effective descriptors on the flash point, the autoignition temperature, and the lower and upper flammability limits of hydrocarbons, respectively. The resulted models are four multilinear equations. These models are very simple and can predict the flash point, the autoignition temperature, and the lower and upper flammability limits for the test set with average absolute errors of 5.41 K, 28.00 K, 0.044 vol %, and 0.503 vol %, respectively. The models are further compared with other published method and are shown to be more superior. The proposed method can be used to predict the flammability characteristics of hydrocarbons from the knowledge of only the molecular structures. © 2009 American Institute of Chemical Engineers AIChE J, 2010 [source]


    A data base for partition of volatile organic compounds and drugs from blood/plasma/serum to brain, and an LFER analysis of the data

    JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 10 2006
    Michael H. Abraham
    Abstract Literature values of the in vivo distribution (BB) of drugs from blood, plasma, or serum to rat brain have been assembled for 207 compounds (233 data points). We find that data on in vivo distribution from blood, plasma, and serum to rat brain can all be combined. Application of our general linear free energy relationship (LFER) to the 207 compounds yields an equation in log BB, with R2,=,0.75 and a standard deviation, SD, of 0.33 log units. An equation for a training set predicts the test set of data with a standard deviation of 0.31 log units. We further find that the invivo data cannot simply be combined with in vitro data on volatile organic and inorganic compounds, because there is a systematic difference between the two sets of data. Use of an indicator variable allows the two sets to be combined, leading to a LFER equation for 302 compounds (328 data points) with R2,=,0.75 and SD,=,0.30 log units. A training equation was then used to predict a test set with SD,=,0.25 log units. © 2006 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 95:2091,2100, 2006 [source]


    Predicting P-glycoprotein substrates by a quantitative structure,activity relationship model

    JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 4 2004
    Vijay K. Gombar
    Abstract A quantitative structure,activity relationship (QSAR) model has been developed to predict whether a given compound is a P-glycoprotein (Pgp) substrate or not. The training set consisted of 95 compounds classified as substrates or non-substrates based on the results from in vitro monolayer efflux assays. The two-group linear discriminant model uses 27 statistically significant, information-rich structure quantifiers to compute the probability of a given structure to be a Pgp substrate. Analysis of the descriptors revealed that the ability to partition into membranes, molecular bulk, and the counts and electrotopological values of certain isolated and bonded hydrides are important structural attributes of substrates. The model fits the data with sensitivity of 100% and specificity of 90.6% in the jackknifed cross-validation test. A prediction accuracy of 86.2% was obtained on a test set of 58 compounds. Examination of the eight "mispredicted" compounds revealed two distinct categories. Five mispredictions were explained by experimental limitations of the efflux assay; these compounds had high permeability and/or were inhibitors of calcein-AM transport. Three mispredictions were due to limitations of the chemical space covered by the current model. The Pgp QSAR model provides an in silico screen to aid in compound selection and in vitro efflux assay prioritization. © 2004 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 93: 957,968, 2004 [source]


    COMPARISON OF SIMPLE AND MULTIMETRIC DIATOM-BASED INDICES FOR GREAT LAKES COASTLINE DISTURBANCE,

    JOURNAL OF PHYCOLOGY, Issue 3 2008
    Euan D. Reavie
    Because diatom communities are subject to the prevailing water quality in the Great Lakes coastal environment, diatom-based indices can be used to support coastal-monitoring programs and paleoecological studies. Diatom samples were collected from Great Lakes coastal wetlands, embayments, and high-energy sites (155 sites), and assemblages were characterized to the species level. We defined 42 metrics on the basis of autecological and functional properties of species assemblages, including species diversity, motile species, planktonic species, proportion dominant taxon, taxonomic metrics (e.g., proportion Stephanodiscoid taxa), and diatom-inferred (DI) water quality (e.g., DI chloride [Cl]). Redundant metrics were eliminated, and a diatom-based multimetric index (MMDI) to infer coastline disturbance was developed. Anthropogenic stresses in adjacent coastal watersheds were characterized using geographic information system (GIS) data related to agricultural and urban land cover and atmospheric deposition. Fourteen independent diatom metrics had significant regressions with watershed stressor data; these metrics were selected for inclusion in the MMDI. The final MMDI was developed as the weighted sum of the selected metric scores with weights based on a metric's ability to reflect anthropogenic stressors in the adjacent watersheds. Despite careful development of the multimetric approach, verification using a test set of sites indicated that the MMDI was not able to predict watershed stressors better than some of the component metrics. From this investigation, it was determined that simpler, more traditional diatom-based metrics (e.g., DI Cl, proportion Cl-tolerant species, and DI total phosphorus [TP]) provide superior prediction of overall stressor influence at coastal locales. [source]


    Application of QSPR to Binary Polymer/Solvent Mixtures: Prediction of Flory-Huggins Parameters

    MACROMOLECULAR THEORY AND SIMULATIONS, Issue 9 2008
    Jie Xu
    Abstract A QSPR study was performed for the prediction of the Flory-Huggins parameters of binary polymer/solvent mixtures. 1,664 descriptors for each polymer and solvent were checked and a cubic multivariable model, with R2,=,0.9638 and s,=,0.146, was produced by using genetic algorithms on a training set of 52 mixtures. The reliability of the proposed model was further validated by satisfactory statistical parameters being obtained using an external test set (,=,0.9565). All descriptors involved in the model can be derived solely from the chemical structures of the polymers and the solvents, which makes it very useful in predicting the Flory-Huggins parameters of unknown or unavailable polymer/solvent mixtures. [source]