Validation Set (validation + set)

Distribution by Scientific Domains
Distribution within Medical Sciences

Kinds of Validation Set

  • independent validation set


  • Selected Abstracts


    Validation Set Correlates of Anogenital Injury after Sexual Assault

    ACADEMIC EMERGENCY MEDICINE, Issue 3 2008
    Peter Drocton MD
    Abstract Objectives:, Forensic investigators remain unsure exactly why some sexual assault victims display acute injury while others do not. This investigation explores potential reasons for these differential findings among female victims. Methods:, This cross-sectional analysis examined data from consecutive female sexual assault victims, at least 12 years old, who agreed to a forensic exam between November 1, 2002, and November 30, 2006. Exams utilized colposcopy, anoscopy, macrodigital imaging, and toluidine blue dye to delineate anogenital injury (AGI), which was defined as the presence of recorded anogenital abrasions, tears, or ecchymosis. Demographic variables of the victim, including sexual experience and reproductive parity, and assault characteristics were recorded in the database for bivariate and multivariate analysis with AGI. Results:, Forty-nine percent of the initial 3,356 patients displayed AGI. Of this total, 2,879 cases included complete data for all variables and were included in the multivariate logistic regression model. A statistically significant increased risk for AGI was noted with: educational status (odds ratio [OR] 1.53, 95% CI = 1.25 to 1.87); vaginal or attempted penetration using penis (OR 2.29, 95% CI = 1.74 to 3.01), finger (OR 1.61, 95% CI = 1.88 to 1.94), or object (OR 3.19, 95% CI = 1.52 to 6.68); anal,penile penetration (OR 2.00, 95% CI = 1.57 to 2.54); alcohol involvement (OR 1.25, 95% CI = 1.04 to 1.50); and virgin status of victim (OR 1.38, 95% CI = 1.11 to 1.71). Victims were less likely to display AGI with a longer postcoital interval (OR 0.50, 95% CI = 0.39 to 0.65) and increased parity (OR 0.76, 95% CI = 0.57 to 0.99). Conclusions:, Approximately half the patients displayed AGI. This rate is higher than earlier studies, but consistent with current investigations utilizing similar injury detection methods. The correlates of injury found reinforce the findings of prior studies, while prompting questions for future study. [source]


    Outcome prediction and risk assessment by quantitative pyrosequencing methylation analysis of the SFN gene in advanced stage, high-risk, neuroblastic tumor patients

    INTERNATIONAL JOURNAL OF CANCER, Issue 3 2010
    Barbara Banelli
    Abstract The aim of our study was to identify threshold levels of DNA methylation predictive of the outcome to better define the risk group of stage 4 neuroblastic tumor patients. Quantitative pyrosequencing analysis was applied to a training set of 50 stage 4, high risk patients and to a validation cohort of 72 consecutive patients. Stage 4 patients at lower risk and ganglioneuroma patients were included as control groups. Predictive thresholds of methylation were identified by ROC curve analysis. The prognostic end points of the study were the overall and progression-free survival at 60 months. Data were analyzed with the Cox proportional hazard model. In a multivariate model the methylation threshold identified for the SFN gene (14.3.3,) distinguished the patients presenting favorable outcome from those with progressing disease, independently from all known predictors (Training set: Overall Survival HR 8.53, p = 0.001; Validation set: HR 4.07, p = 0.008). The level of methylation in the tumors of high-risk patients surviving more than 60 months was comparable to that of tumors derived from lower risk patients and to that of benign ganglioneuroma. Methylation above the threshold level was associated with reduced SFN expression in comparison with samples below the threshold. Quantitative methylation is a promising tool to predict survival in neuroblastic tumor patients. Our results lead to the hypothesis that a subset of patients considered at high risk,but displaying low levels of methylation,could be assigned at a lower risk group. [source]


    A System for Grouping Presenting Complaints: The Pediatric Emergency Reason for Visit Clusters

    ACADEMIC EMERGENCY MEDICINE, Issue 8 2005
    MSCE, Marc H. Gorelick MD
    Abstract Objectives: To develop a set of chief complaint groupings for pediatric emergency department (ED) visits that is comprehensive, parsimonious, clinically sensible, and evidence-based. Methods: Investigators derived candidate chief complaint clusters and ranked them a priori into three perceived severity categories. Pediatric visits were extracted from the National Hospital Ambulatory Medical Care Survey (NHAMCS); data for years 1998 and 2000 (n= 13,186) were used for derivation and data for year 1999 (n= 5,365) were used for validation. Visits were assigned to clusters based on the recorded complaints; clusters were combined to ensure adequate numbers for analysis (minimum n= 20), and the clusters were reviewed for clinical sensibility. Resource utilization was categorized in three levels: routine (examination only), ED treatment (tests or therapy in the ED but not admitted), and admission. Area under the receiver-operating characteristic (ROC) curve (AUC) was used to demonstrate the discriminative ability of the clusters in predicting resource use. Results: There were 463 unique complaints in the derivation database; 95 (20%) had a single associated visit. Fifty-two clusters were generated; only 2.4% of complaints were classified as other. The eight most common clusters encompassed 52% of the visits. The top five were fever (11%), extremity pain/injury, vomiting, cough, and trauma (unspecified). Complaint clusters were associated with actual resource utilization: for routine care, the AUC was 0.73 (0.74 in the validation set), and for admission, the AUC was 0.77 (0.74 in the validation set). Both resource utilization and triage classification increased with increased expert severity ranking (test for trend, p < 0.001). Conclusions: The proposed Pediatric Emergency Reason for Visit Cluster (PERC) system is a comprehensive yet parsimonious, clinically sensible means of categorizing pediatric ED complaints. The PERC system's association with measures of acuity and resource utilization makes it a potentially useful tool in epidemiologic and health services research. [source]


    Predicting ready biodegradability in the Japanese ministry of international trade and industry test

    ENVIRONMENTAL TOXICOLOGY & CHEMISTRY, Issue 10 2000
    Jay Tunkel
    Abstract Two new predictive models for assessing a chemical's biodegradability in the Japanese Ministry of International Trade and Industry (MITI) ready biodegradation test have been developed. The new methods use an approach similar to that in the existing BIOWIN© program, in which the probability of rapid biodegradation is estimated by means of multiple linear or nonlinear regression against counts of 36 chemical substructures (molecular fragments) plus molecular weight (mol wt). The data set used to develop the new models consisted of results (pass/no pass) from the MITI test for 884 discrete organic chemicals. This data set was first divided into randomly selected training and validation sets, and new coefficients were derived for the training set using the BIOWIN fragment library and mol wt as independent variables. Based on these results, the fragment library was then modified by deleting some fragments and adding or refining others, and the new set of independent variables (42 substructures and mol wt) was fit to the MITI data. The resulting linear and nonlinear regression models accurately classified 81% of the chemicals in an independent validation set. Like the established BIOWIN models, the MITI models are intended for use in chemical screening and in setting priorities for further review. [source]


    Predicting pasture root density from soil spectral reflectance: field measurement

    EUROPEAN JOURNAL OF SOIL SCIENCE, Issue 1 2010
    B. H. KUSUMO
    This paper reports the development and evaluation of a field technique for in situ measurement of root density using a portable spectroradiometer. The technique was evaluated at two sites in permanent pasture on contrasting soils (an Allophanic and a Fluvial Recent soil) in the Manawatu region, New Zealand. Using a modified soil probe, reflectance spectra (350,2500 nm) were acquired from horizontal surfaces at three depths (15, 30 and 60 mm) of an 80-mm diameter soil core, totalling 108 samples for both soils. After scanning, 3-mm soil slices were taken at each depth for root density measurement and soil carbon (C) and nitrogen (N) analysis. The two soils exhibited a wide range of root densities from 1.53 to 37.03 mg dry root g,1 soil. The average root density in the Fluvial soil (13.21 mg g,1) was twice that in the Allophanic soil (6.88 mg g,1). Calibration models, developed using partial least squares regression (PLSR) of the first derivative spectra and reference data, were able to predict root density on unknown samples using a leave-one-out cross-validation procedure. The root density predictions were more accurate when the samples from the two soil types were separated (rather than grouped) to give sub-populations (n = 54) of spectral data with more similar attributes. A better prediction of root density was achieved in the Allophanic soil (r2 = 0.83, ratio prediction to deviation (RPD ) = 2.44, root mean square error of cross-validation (RMSECV ) = 1.96 mg g ,1) than in the Fluvial soil (r2 = 0.75, RPD = 1.98, RMSECV = 5.11 mg g ,1). It is concluded that pasture root density can be predicted from soil reflectance spectra acquired from field soil cores. Improved PLSR models for predicting field root density can be produced by selecting calibration data from field data sources with similar spectral attributes to the validation set. Root density and soil C content can be predicted independently, which could be particularly useful in studies examining potential rates of soil organic matter change. [source]


    Assessing biotic integrity in a Mediterranean watershed: development and evaluation of a fish-based index

    FISHERIES MANAGEMENT & ECOLOGY, Issue 4 2008
    M. F. MAGALHÃES
    Abstract, Biological indicators for Mediterranean rivers are poorly developed. This study evaluates the effectiveness of the Index of Biotic Integrity approach (IBI) with fish assemblages in the Guadiana catchment, a typical Mediterranean watershed in Southern Portugal. Reference sites were selected from a set of 95 sites, using a multivariate approach. Fifty-five candidate metrics were screened for range, responsiveness, precision and redundancy. Final metrics included: proportion of native fish, number of intolerant and intermediate species, number of invertivore native fish, number of phyto-lithophilic and polyphilic species, and catches of exotics. The IBI scores correlated with composite gradients of human impact and differed significantly between reference and non-reference sites. Application of the IBI to an independent validation set with 123 sites produced results congruent with the development set and repeatable assessments at 22 sites showed concordance in IBI scoring. This application highlights the effectiveness of the IBI approach even with fish assemblages of limited diversity and ecological specialisation as in Mediterranean streams. [source]


    A simple noninvasive index can predict both significant fibrosis and cirrhosis in patients with chronic hepatitis C

    HEPATOLOGY, Issue 2 2003
    Chun-Tao Wai
    Information on the stage of liver fibrosis is essential in managing chronic hepatitis C (CHC) patients. However, most models for predicting liver fibrosis are complicated and separate formulas are needed to predict significant fibrosis and cirrhosis. The aim of our study was to construct one simple model consisting of routine laboratory data to predict both significant fibrosis and cirrhosis among patients with CHC. Consecutive treatment-naive CHC patients who underwent liver biopsy over a 25-month period were divided into 2 sequential cohorts: training set (n = 192) and validation set (n = 78). The best model for predicting both significant fibrosis (Ishak score , 3) and cirrhosis in the training set included platelets, aspartate aminotransferase (AST), and alkaline phosphatase with an area under ROC curves (AUC) of 0.82 and 0.92, respectively. A novel index, AST to platelet ratio index (APRI), was developed to amplify the opposing effects of liver fibrosis on AST and platelet count. The AUC of APRI for predicting significant fibrosis and cirrhosis were 0.80 and 0.89, respectively, in the training set. Using optimized cut-off values, significant fibrosis could be predicted accurately in 51% and cirrhosis in 81% of patients. The AUC of APRI for predicting significant fibrosis and cirrhosis in the validation set were 0.88 and 0.94, respectively. In conclusion, our study showed that a simple index using readily available laboratory results can identify CHC patients with significant fibrosis and cirrhosis with a high degree of accuracy. Application of this index may decrease the need for staging liver biopsy specimens among CHC patients. [source]


    Frequent inactivation of SPARC by promoter hypermethylation in colon cancers

    INTERNATIONAL JOURNAL OF CANCER, Issue 3 2007
    Eungi Yang
    Abstract Epigenetic modification of gene expression plays an important role in the development of human cancers. The inactivation of SPARC through CpG island methylation was studied in colon cancers using oligonucleotide microarray analysis and methylation specific PCR (MSP). Gene expression of 7 colon cancer cell lines was evaluated before and after treatment with the demethylating agent 5-aza-2,-deoxycytidine (5Aza-dC) by oligonucleotide microarray analysis. Expression of SPARC was further examined in colon cancer cell lines and primary colorectal cancers, and the methylation status of the SPARC promoter was determined by MSP. SPARC expression was undetectable in 5 of 7 (71%) colorectal cancer cell lines. Induction of SPARC was demonstrated after treatment with the demethylating agent 5Aza-dC in 5 of the 7 cell lines. We examined the methylation status of the CpG island of SPARC in 7 colon cancer cell lines and in 20 test set of colon cancer tissues. MSP demonstrated hypermethylation of the CpG island of SPARC in 6 of 7 cell lines and in all 20 primary colon cancers, when compared with only 3 of 20 normal colon mucosa. Immunohistochemical analysis showed that SPARC expression was downregulated or absent in 17 of 20 colon cancers. A survival analysis of 292 validation set of colorectal carcinoma patients revealed a poorer prognosis for patients lacking SPARC expression than for patients with normal SPARC expression (56.79% vs. 75.83% 5-year survival rate, p = 0.0014). The results indicate that epigenetic gene silencing of SPARC is frequent in colon cancers, and that inactivation of SPARC is related to rapid progression of colon cancers. © 2007 Wiley-Liss, Inc. [source]


    Urinary biomarker profiling in transitional cell carcinoma

    INTERNATIONAL JOURNAL OF CANCER, Issue 11 2006
    Nicholas P. Munro
    Abstract Urinary biomarkers or profiles that allow noninvasive detection of recurrent transitional cell carcinoma (TCC) of the bladder are urgently needed. We obtained duplicate proteomic (SELDI) profiles from 227 subjects (118 TCC, 77 healthy controls and 32 controls with benign urological conditions) and used linear mixed effects models to identify peaks that are differentially expressed between TCC and controls and within TCC subgroups. A Random Forest classifier was trained on 130 profiles to develop an algorithm to predict the presence of TCC in a randomly selected initial test set (n = 54) and an independent validation set (n = 43) several months later. Twenty two peaks were differentially expressed between all TCC and controls (p < 10,7). However potential confounding effects of age, sex and analytical run were identified. In an age-matched sub-set, 23 peaks were differentially expressed between TCC and combined benign and healthy controls at the 0.005 significance level. Using the Random Forest classifier, TCC was predicted with 71.7% sensitivity and 62.5% specificity in the initial set and with 78.3% sensitivity and 65.0% specificity in the validation set after 6 months, compared with controls. Several peaks of importance were also identified in the linear mixed effects model. We conclude that SELDI profiling of urine samples can identify patients with TCC with comparable sensitivities and specificities to current tumor marker tests. This is the first time that reproducibility has been demonstrated on an independent test set analyzed several months later. Identification of the relevant peaks may facilitate multiplex marker assay development for detection of recurrent disease. © 2006 Wiley-Liss, Inc. [source]


    Molecular profiling of platinum resistant ovarian cancer,

    INTERNATIONAL JOURNAL OF CANCER, Issue 8 2006
    Jozien Helleman
    Abstract The aim of this study is to discover a gene set that can predict resistance to platinum-based chemotherapy in ovarian cancer. The study was performed on 96 primary ovarian adenocarcinoma specimens from 2 hospitals all treated with platinum-based chemotherapy. In our search for genes, 24 specimens of the discovery set (5 nonresponders and 19 responders) were profiled in duplicate with 18K cDNA microarrays. Confirmation was done using quantitative RT-PCR on 72 independent specimens (9 nonresponders and 63 responders). Sixty-nine genes were differentially expressed between the nonresponders (n = 5) and the responders (n = 19) in the discovery phase. An algorithm was constructed to identify predictive genes in this discovery set. This resulted in 9 genes (FN1, TOP2A, LBR, ASS, COL3A1, STK6, SGPP1, ITGAE, PCNA), which were confirmed with qRT-PCR. This gene set predicted platinum resistance in an independent validation set of 72 tumours with a sensitivity of 89% (95% CI: 0.68,1.09) and a specificity of 59% (95% CI: 0.47,0.71)(OR = 0.09, p = 0.026). Multivariable analysis including patient and tumour characteristics demonstrated that this set of 9 genes is independent for the prediction of resistance (p < 0.01). The findings of this study are the discovery of a gene signature that classifies the tumours, according to their response, and a 9-gene set that determines resistance in an independent validation set that outperforms patient and tumour characteristics. A larger independent multicentre study should further confirm whether this 9-gene set can identify the patients who will not respond to platinum-based chemotherapy and could benefit from other therapies. © 2005 Wiley-Liss, Inc. [source]


    Principles of Proper Validation: use and abuse of re-sampling for validation

    JOURNAL OF CHEMOMETRICS, Issue 3-4 2010
    Kim H. Esbensen
    Abstract Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences,which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all ,future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t,u plots for optimal understanding of the X,Y interrelationships and for validation guidance. QSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Variable selection in random calibration of near-infrared instruments: ridge regression and partial least squares regression settings

    JOURNAL OF CHEMOMETRICS, Issue 3 2003
    Arief Gusnanto
    Abstract Standard methods for calibration of near-infrared instruments, such as partial least-squares (PLS) and ridge regression (RR), typically use the full set of wavelengths in the model. In this paper we investigate the effect of variable (wavelength) selection for these two methods on the model prediction. For RR the selection is optimized with respect to the ridge parameter, the number of variables and the configuration of the variables in the model. A fast iterative computational algorithm is developed for the purpose of this optimization. For PLS the selection is optimized with respect to the number of components, the number of variables and the configuration of the variables. We use three real data sets in this study: processed milk from the market, milk from a dairy farm and milk from the production line of a milk processing factory. The quantity of interest is the concentration of fat in the milk. The observations are randomly split into estimation and validation sets. Optimization is based on the mean square prediction error computed on the validation set. The results indicate that the wavelength selection will not always give better prediction than using all of the available wavelengths. Investigation of the information in the spectra is necessary to determine whether all of them are relevant to the objective of the model. Copyright © 2003 John Wiley & Sons, Ltd. [source]


    Evaluating effectiveness of preoperative testing procedure: some notes on modelling strategies in multi-centre surveys

    JOURNAL OF EVALUATION IN CLINICAL PRACTICE, Issue 1 2008
    Dario Gregori PhD
    Abstract Rationale, In technology assessment in health-related fields the construction of a model for interpreting the economic implications of the introduction of a technology is only a part of the problem. The most important part is often the formulation of a model that can be used for selecting patients to submit to the new cost-saving procedure or medical strategy. The model is usually complicated by the fact that data are often non-homogeneous with respect to some uncontrolled variables and are correlated. The most typical example is the so-called hospital effect in multi-centre studies. Aims and objectives, We show the implications derived by different choices in modelling strategies when evaluating the usefulness of preoperative chest radiography, an exam performed before surgery, usually with the aim to detect unsuspected abnormalities that could influence the anaesthetic management and/or surgical plan. Method, We analyze the data from a multi-centre study including more than 7000 patients. We use about 6000 patients to fit regression models using both a population averaged and a subject-specific approach. We explore the limitations of these models when used for predictive purposes using a validation set of more than 1000 patients. Results, We show the importance of taking into account the heterogeneity among observations and the correlation structure of the data and propose an approach for integrating a population-averaged and subject specific approach into a single modeling strategy. We find that the hospital represents an important variable causing heterogeneity that influences the probability of a useful POCR. Conclusions, We find that starting with a marginal model, evaluating the shrinkage effect and eventually move to a more detailed model for the heterogeneity is preferable. This kind of flexible approach seems to be more informative at various phases of the model-building strategy. [source]


    Rapid Determination of Invert Cane Sugar Adulteration in Honey Using FTIR Spectroscopy and Multivariate Analysis

    JOURNAL OF FOOD SCIENCE, Issue 6 2003
    J. Irudayaraj
    ABSTRACT: Fourier transform infrared spectroscopy with an attenuated total reflection sampling accessory was combined with multivariate analysis to determine the level (1% to 25%, wt/wt) of invert cane sugar adulteration in honey. On the basis of the spectral data compression by principal component analysis and partial least squares, linear discriminant analysis (LDA), and canonical variate analysis (CVA), models were developed and validated. Two types of artificial neural networks were applied: a quick back propagation network (BPN) and a radial basis function network (RBFN). The prediction success rates were better with LDA (93.75% for validation set) and BPN (93.75%) than with CVA (87.50%) and RBFN (81.25%). [source]


    Model consisting of ultrasonographic and simple blood indexes accurately identify compensated hepatitis B cirrhosis

    JOURNAL OF GASTROENTEROLOGY AND HEPATOLOGY, Issue 8pt1 2008
    Yong-Peng Chen
    Abstract Background and Aim:, Several models for significant fibrosis or cirrhosis have been introduced for hepatitis C, but are seldom for hepatitis B. The present study retrospectively evaluates the relationship between ultrasonography, blood tests, and fibrosis stage, and constructs a model for predicting compensated cirrhosis. Methods:, A total of 653 patients with chronic hepatitis B who underwent liver biopsies, ultrasonographic scanning, and routine blood tests were retrospectively analyzed. The patients were divided into the model set and validation set. Blood tests and ultrasonographic indexes were analyzed statistically. An ultrasonographic scoring system consisting of liver parenchyma, gallbladder, hepatic vessel, and splenomegaly was introduced. Results:, There were significant differences between cirrhosis and other fibrosis stages in ultrasonographic indexes of liver parenchyma, gallbladder, hepatic vessel, and splenomegaly. Ultrasonographic scores were significantly different between F4 and other fibrosis, and significantly correlated with fibrosis stage. Apart from alanine aminotransferase and alkaline phosphatase, blood tests and patients' age were correlated with fibrosis, and were significantly different between patients with and without cirrhosis. The model for cirrhosis indexes consisting of ultrasonographic score, patient's age, and variables, including platelet, albumin, and bilirubin predicted cirrhosis with area under receiver,operator curve of 0.907 in the model set and 0.849 in the validation set. Using proper cut-off values, nearly 81% patients could be accurately assessed for the absence or presence of cirrhosis. Conclusion:, The model consisting of ultrasonographic score, patients' age, blood variables of platelet, albumin, and bilirubin can identify hepatitis B cirrhosis with a high degree of accuracy. The application of this model would greatly reduce the number of biopsies. [source]


    Machine learning approaches for prediction of linear B-cell epitopes on proteins

    JOURNAL OF MOLECULAR RECOGNITION, Issue 3 2006
    Johannes Söllner
    Abstract Identification and characterization of antigenic determinants on proteins has received considerable attention utilizing both, experimental as well as computational methods. For computational routines mostly structural as well as physicochemical parameters have been utilized for predicting the antigenic propensity of protein sites. However, the performance of computational routines has been low when compared to experimental alternatives. Here we describe the construction of machine learning based classifiers to enhance the prediction quality for identifying linear B-cell epitopes on proteins. Our approach combines several parameters previously associated with antigenicity, and includes novel parameters based on frequencies of amino acids and amino acid neighborhood propensities. We utilized machine learning algorithms for deriving antigenicity classification functions assigning antigenic propensities to each amino acid of a given protein sequence. We compared the prediction quality of the novel classifiers with respect to established routines for epitope scoring, and tested prediction accuracy on experimental data available for HIV proteins. The major finding is that machine learning classifiers clearly outperform the reference classification systems on the HIV epitope validation set. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Who Should Be Screened for Asymptomatic Carotid Artery Stenosis?

    JOURNAL OF NEUROIMAGING, Issue 2 2001
    Experience From the Western New York Stroke Screening Program
    ABSTRACT Objective. Identification of significant asymptomatic carotid artery stenosis (ACAS) is important because of the stroke-risk reduction observed with carotid endarterectomy. The authors developed and validated a simple scoring system based on routinely available information to identify persons at high risk for ACAS using data collected during a community health screening program at various sites in western New York. A total of 1331 unselected volunteers without previous stroke, transient ischemic attack, or carotid artery surgery were evaluated by personal interview and duplex ultrasound. The main outcome measure was carotid artery stenosis >60% by duplex ultrasound. In the derivation set (n= 887), 4 variables were significantly associated with ACAS >60%: age >65 years (odds ratio [OR] = 4.1, 95% confidence interval [CI] = 2.6,6.7), current smoking (OR = 2.0, 95% CI = 1.2,3.5), coronary artery disease (OR = 2.4, 95% CI = 1.5,3.9), and hypercholesterolemia (OR = 1.9, 95% CI = 1.2,2.9). Three risk groups (low, intermediate, and high) were defined on the basis of total risk score assigned on the basis of the strength of association. The scheme effectively stratified the validation set (n= 444); the likelihood ratio and posttest probability for ACAS in the high-risk group were 3.0 and 35%, respectively, and in the intermediate and low-risk groups were 1.4 and 20% and 0.4 and 7%, respectively. Routinely available information can be used to identify persons in the community at high risk for ACAS. Doppler ultrasound screening in this subgroup may prove to be cost-effective and have an effect on stroke-free survival. [source]


    Monitoring the film coating unit operation and predicting drug dissolution using terahertz pulsed imaging

    JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 12 2009
    Louise Ho
    Abstract Understanding the coating unit operation is imperative to improve product quality and reduce output risks for coated solid dosage forms. Three batches of sustained-release tablets coated with the same process parameters (pan speed, spray rate, etc.) were subjected to terahertz pulsed imaging (TPI) analysis followed by dissolution testing. Mean dissolution times (MDT) from conventional dissolution testing were correlated with terahertz waveforms, which yielded a multivariate, partial least squares regression (PLS) model with an R2 of 0.92 for the calibration set and 0.91 for the validation set. This two-component, PLS model was built from batch I that was coated in the same environmental conditions (air temperature, humidity, etc.) to that of batch II but at different environmental conditions from batch III. The MDTs of batch II was predicted in a nondestructive manner with the developed PLS model and the accuracy of the predicted values were subsequently validated with conventional dissolution testing and found to be in good agreement. The terahertz PLS model was also shown to be sensitive to changes in the coating conditions, successfully identifying the larger coating variability in batch III. In this study, we demonstrated that TPI in conjunction with PLS analysis could be employed to assist with film coating process understanding and provide predictions on drug dissolution. © 2009 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 98:4866,4876, 2009 [source]


    Failure time regression with continuous covariates measured with error

    JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2000
    Halbo Zhou
    We consider failure time regression analysis with an auxiliary variable in the presence of a validation sample. We extend the nonparametric inference procedure of Zhou and Pepe to handle a continuous auxiliary or proxy covariate. We estimate the induced relative risk function with a kernel smoother and allow the selection probability of the validation set to depend on the observed covariates. We present some asymptotic properties for the kernel estimator and provide some simulation results. The method proposed is illustrated with a data set from an on-going epidemiologic study. [source]


    Detection of inverted beet sugar adulteration of honey by FTIR spectroscopy

    JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE, Issue 8 2001
    S Sivakesava
    Abstract A combination of Fourier transform infrared (FTIR) spectroscopy and multivariate statistics as a screening tool for the determination of beet medium invert sugar adulteration in three different varieties of honey is discussed. Honey samples with different concentrations of beet invert sugar were scanned using the attenuated total reflectance (ATR) accessory of the Bio-Rad FTS-6000 Fourier transform spectrometer. The spectral wavenumber region between 950 and 1500,cm,1 was selected for partial least squares (PLS) regression to develop calibration models for beet invert sugar determination in honey samples. Results from the PLS (first derivative) models were slightly better than those obtained with other calibration models. Predictive models were also developed to classify beet sugar invert in three different varieties of honey samples using discriminant analysis. Spectral data were compressed using the principal component method, and linear discriminant and canonical variate analyses were used to detect the level of beet invert sugar in honey samples. The best predictive model for adulterated honey samples was achieved with canonical variate analysis, which successfully classified 88,94 per cent of the validation set. The present study demonstrated that Fourier transform infrared spectroscopy could be used for rapid detection of beet invert sugar adulteration in different varieties of honey. © 2001 Society of Chemical Industry [source]


    Predictive value of actin-free Gc-globulin in acute liver failure,,

    LIVER TRANSPLANTATION, Issue 9 2007
    Frank V. Schiødt
    Serum concentrations of the actin scavenger Gc-globulin may provide prognostic information in acute liver failure (ALF). The fraction of Gc-globulin not bound to actin is postulated to represent a better marker than total Gc-globulin but has been difficult to measure. We tested a new rapid assay for actin-free Gc-globulin to determine its prognostic value when compared with the King's College Hospital (KCH) criteria in a large number of patients with ALF. A total of 252 patients with varying etiologies from the U.S. ALF Study Group registry were included; the first 178 patients constituted the learning set, and the last 74 patients served as the validation set. Actin-free Gc-globulin was determined with a commercial enzyme-linked immunosorbent assay kit. The median (range) actin-free Gc-globulin level at admission for the learning set was significantly reduced compared with controls (47 [0-183] mg/L vs. 204 [101-365] mg/L, respectively, P < 0.001). Gc-globulin levels were significantly higher in spontaneous survivors than in patients who died or were transplanted (53 [0-129] mg/L vs. 37 [0-183] mg/L, P = 0.002). A receiver operating characteristic curve analysis showed that a 40 mg/L cutoff level carried the best prognostic information, yielding positive and negative predictive values of 68% and 67%, respectively, in the validation set. The corresponding figures for the KCH criteria were 72% and 64%. A new enzyme-linked immunosorbent assay for actin-free Gc-globulin provides the same (but not optimal) prognostic information as KCH criteria in a single measurement at admission. Liver Transpl 13:1324,1329, 2007. © 2007 AASLD. [source]


    Identification of Novel CDK2 Inhibitors by QSAR and Virtual Screening Procedures

    MOLECULAR INFORMATICS, Issue 11-12 2008
    Ajay Babu, Padavala
    Abstract Quantitative Structure,Activity Relationship (QSAR) studies were carried out on a set of 46 imidazo[1,2-a]pyridines, imidazo[1,2-b]pyridazines and 2,4-bis anilino pyrimidines, and nitroso-6-aminopyrimidine and 2,6-diaminopyrimidine inhibitors of CDK2 (Cyclin-dependent Kinase2) using a multiple regression procedure. The activity contributions of these compounds were determined from regression equation and the validation procedures such as external set cross-validation r2, (R2cv,ext) and the regression of observed activities against predicted activities and vice versa for validation set were described to analyze the predictive ability of the QSAR model. An accurate and reliable QSAR model involving five descriptors was chosen based on the FIT Kubinyi function which defines the statistical quality of the model. The proposed model due to its high predictive ability was utilized to screen similar repertoire of compounds reported in the literature, and the biological activities are estimated. The screening study clearly demonstrated that the strategy presented shall be used as an alternative to the time-consuming experiments as the model tolerated a variety of structural modifications signifying its potential for drug design studies. [source]


    Artemisinin Derivatives with Antimalarial Activity against Plasmodium falciparum Designed with the Aid of Quantum Chemical and Partial Least Squares Methods

    MOLECULAR INFORMATICS, Issue 8 2003

    Abstract Artemisinin derivatives with antimalarial activity against Plasmodium falciparum resistant to mefloquine are designed with the aid of Quantum Chemical and Partial Least Squares Methods. The PLS model with three principal components explaining 89.55% of total variance, Q2=0.83 and R2=0.92 was obtained for 14/5 molecules in the training/external validation set. The most important descriptors for the design of the model were one level above the lowest unoccupied molecular orbital energy (LUMO+1), atomic charges in atoms C9 and C11 (Q9) and (Q11) respectively, the maximum number of hydrogen atoms that might make contact with heme (NH) and RDF030,m (a radial distribution function centered at 3.0,Å interatomic distance and weighted by atomic masses). From a set of ten proposed artemisinin derivatives, a new compound (26), was predicted with antimalarial activity higher than the compounds reported in literature. Molecular graphics and modeling supported the PLS results and revealed heme-ligand and protein-ligand stereoelectronic relationships as important for antimalarial activity. The most active 26 and 29 in the prediction set possess substituents at C9 able to extend to hemoglobin exterior, what determines the high activity of these compounds. [source]


    Clinical prediction rule to diagnose post-infectious bronchiolitis obliterans in children

    PEDIATRIC PULMONOLOGY, Issue 11 2009
    Alejandro J. Colom
    Abstract Rationale Infant pulmonary function testing has a great value in the diagnosis of post-infectious bronchiolitis obliterans (BOs), because of characteristic patterns of severe and fixed airway obstruction. Unfortunately, infant pulmonary function testing is not available in most pediatric pulmonary centers. Objective To develop and validate a clinical prediction rule (BO-Score) to diagnose children under 2 years of age with BOs, using multiple objectively measured parameters readily available in most medical centers. Methods Study subjects, children under 2 years old with a chronic pulmonary disease assisted at R. Gutierrez Children's Hospital of Buenos Aires. Patients were randomly divided into a derivation (66%) and a validation (34%) set. ROC analyses and multivariable logistic regression included significant clinical, radiological, and laboratory predictors. The main outcome measure was a diagnosis of BOs. The performance of the BO-Score was tested on the validation set. Results Hundred twenty-five patients were included, 83 in the derivation set and 42 in the validation set. The BO-Score (area under ROC curve,=,0.96; 95% CI, 0.9,1.0%) was developed by assigning points to the following variables: typical clinical history (four points), adenovirus infection (three points), and high-resolution computed tomography with mosaic perfusion (four points). A Score ,7 predicted the diagnosis of BOs with a specificity of 100% (95% CI, 79,100%) and a sensitivity of 67% (95% CI, 47,80%). Conclusions The BO-Score is a simple-to-use clinical prediction rule, based on variables that are readily available. A BO-Score of 7 or more predicts a diagnosis of post-infectious BOs with high accuracy. Pediatr Pulmonol. 2009; 44:1065,1069. ©2009 Wiley-Liss, Inc. [source]


    Discovering robust protein biomarkers for disease from relative expression reversals in 2-D DIGE data.

    PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 8 2007
    Troy J. Anderson
    Abstract This study assesses the ability of a novel family of machine learning algorithms to identify changes in relative protein expression levels, measured using 2-D DIGE data, which support accurate class prediction. The analysis was done using a training set of 36 total cellular lysates comprised of six normal and three cancer biological replicates (the remaining are technical replicates) and a validation set of four normal and two cancer samples. Protein samples were separated by 2-D DIGE and expression was quantified using DeCyder-2D Differential Analysis Software. The relative expression reversal (RER) classifier correctly classified 9/9 training biological samples (p<0.022) as estimated using a modified version of leave one out cross validation and 6/6 validation samples. The classification rule involved comparison of expression levels for a single pair of protein spots, tropomyosin isoforms and ,-enolase, both of which have prior association as potential biomarkers in cancer. The data was also analyzed using algorithms similar to those found in the extended data analysis package of DeCyder software. We propose that by accounting for sources of within- and between-gel variation, RER classifiers applied to 2-D DIGE data provide a useful approach for identifying biomarkers that discriminate among protein samples of interest. [source]


    Consistency of a two clinical site sample collection: A proteomics study

    PROTEOMICS - CLINICAL APPLICATIONS, Issue 8-9 2010
    Cedric Wiesner
    Abstract Purpose: We investigated the ability to perform a clinical proteomic study using samples collected at different times from two independent clinical sites. Experimental Design: Label-free 2-D-LC-MS proteomic analysis was used to differentially quantify tens of thousands of peptides from human plasma. We have asked whether samples collected from two sites, when analyzed by this type of peptide profiling, reproducibly contain detectable peptide markers that are differentially expressed in the plasma of disease (advanced renal cancer) patients relative to healthy normals. Results: We have demonstrated that plasma proteins enriched in disease patients are indeed detected reproducibly in both clinical collections. Regression analysis, unsupervised hierarchical clustering and PCA detected no systematic bias in the data related to site of sample collection and processing. Using a genetic algorithm, support vector machine classification method, we were able to correctly classify disease samples at 88% sensitivity and 94% specificity using the second site as an independent validation set. Conclusions and clinical relevance: We conclude that multiple site collection, when analyzed by label-free 2-D-LC-MS, generates data that are sufficiently reproducible to guide reliable biomarker discovery. [source]


    Magnetic resonance imaging as a potential surrogate for relapses in multiple sclerosis: A meta-analytic approach,

    ANNALS OF NEUROLOGY, Issue 3 2009
    Maria Pia Sormani MscStat
    Objective The aim of this work was to evaluate whether the treatment effects on magnetic resonance imaging (MRI) markers at the trial level were able to predict the treatment effects on relapse rate in relapsing-remitting multiple sclerosis. Methods We used a pooled analysis of all the published randomized, placebo-controlled clinical trials in relapsing-remitting multiple sclerosis reporting data both on MRI variables and relapses. We extracted data on relapses and on MRI "active" lesions. A regression analysis weighted on trial size and duration was performed to study the relation between the treatment effect on relapses and the treatment effect on MRI lesions. We validated the estimated relation on an independent set of clinical trials satisfying the same inclusion criteria but with a control arm other than placebo. Results A set of 23 randomized, double-blind, placebo-controlled trials in relapsing-remitting multiple sclerosis was identified, for a total of 63 arms, 40 contrasts, and 6,591 patients. A strong correlation was found between the effect on the relapses and the effect on MRI activity. The adjusted R2 value of the weighted regression line was 0.81. The regression equation estimated using the placebo-controlled trials gave a satisfactory prediction of the treatment effect on relapses when applied to the validation set. Interpretation More than 80% of the variance in the effect on relapses between trials is explained by the variance in MRI effects. Smaller and shorter phase II studies based on MRI lesion end points may give indications also on the effect of the treatment on relapse end points. Ann Neurol 2009;65:268,275 [source]


    Regression Calibration in Semiparametric Accelerated Failure Time Models

    BIOMETRICS, Issue 2 2010
    Menggang Yu
    Summary In large cohort studies, it often happens that some covariates are expensive to measure and hence only measured on a validation set. On the other hand, relatively cheap but error-prone measurements of the covariates are available for all subjects. Regression calibration (RC) estimation method (Prentice, 1982,,Biometrika,69, 331,342) is a popular method for analyzing such data and has been applied to the Cox model by Wang et al. (1997,,Biometrics,53, 131,145) under normal measurement error and rare disease assumptions. In this article, we consider the RC estimation method for the semiparametric accelerated failure time model with covariates subject to measurement error. Asymptotic properties of the proposed method are investigated under a two-phase sampling scheme for validation data that are selected via stratified random sampling, resulting in neither independent nor identically distributed observations. We show that the estimates converge to some well-defined parameters. In particular, unbiased estimation is feasible under additive normal measurement error models for normal covariates and under Berkson error models. The proposed method performs well in finite-sample simulation studies. We also apply the proposed method to a depression mortality study. [source]


    A risk score for predicting perioperative blood transfusion in liver surgery

    BRITISH JOURNAL OF SURGERY (NOW INCLUDES EUROPEAN JOURNAL OF SURGERY), Issue 7 2007
    C. Pulitanò
    Background: It would be desirable to predict which patients are most likely to benefit from preoperative autologous blood donation. This aim of this study was to develop a point scoring system for predicting the need for blood transfusion in liver surgery. Methods: The medical records of 480 consecutive patients who underwent hepatic resection were analysed. The data set was split randomly into a derivation set of two-thirds and a validation set of one-third. Univariable analysis was carried out to determine the association between clinicopathological factors and blood transfusion. Significant variables were entered into a multiple logistic regression model, and a transfusion risk score (TRS) was developed. The accuracy of the system was validated by calculating the area under the receiver,operator characteristic (ROC) curve. Results: Factors associated with blood transfusion in multivariable analysis included preoperative haemoglobin concentration below 12·5 g/dl, largest tumour more than 4 cm, need for exposure of the vena cava, need for an associated procedure, and cirrhosis. Each variable was assigned one point, and the total score was compared with the transfusion status of each patient in the validation set. The TRS accurately predicted the likelihood of blood transfusion. In the validation set the area under the ROC curve was 0·89. Conclusion: Use of the TRS could lead to substantial saving by improving the cost-effectiveness of the autologous blood donation programme. Copyright © 2007 British Journal of Surgery Society Ltd. Published by John Wiley & Sons, Ltd. [source]


    A novel algorithm to improve pathologic stage prediction of clinically organ-confined muscle-invasive bladder cancer

    CANCER, Issue 7 2009
    David Margel MD
    Abstract BACKGROUND: An algorithm was created to predict pathologic stage in patients with clinically organ-confined muscle-invasive bladder cancer. METHODS: The sample consisted of 133 consecutive patients scheduled to undergo cystectomy. To develop a tool to predict nonorgan-confined disease before surgery, principal component analysis (PCA) was applied. Patients were stratified into a training set (n = 89) and a validation set (n = 44), and 7 parameters were evaluated: levels of carcinoembryonic antigen, cancer antigen (CA) 125, and carbohydrate antigen (CA) 19-9; clinical stage; presence of hydronephrosis; presence of carcinoma in situ; and initial tumor size >3 cm. PCA was applied to the training set to determine the weight of each parameter. A PCA score was generated for each patient in the set, and a cutoff defining nonorgan-confined disease was established. The accuracy of the cutoff was quantified by the area under the receiver operator characteristics curve (AUC). The model was then applied to the validation set without recalculation; the AUC and the positive and negative predictive values of the validation set were calculated. RESULTS: On pathologic evaluation, 71 patients (53%) were found to have organ-confined tumors and 62 patients (47%) had extravesical disease. The AUC was 0.85 in the training group (95% confidence interval [95% CI], 0.71-0.97) and 0.84 in the validation group (95% CI, 0.75-0.93). The positive and negative predictive values in the validation group were 88% (95% CI, 71%-96%) and 94% (95% CI, 71%-99%), respectively. CONCLUSIONS: The newly devised, internally validated, algorithm was 85% accurate in predicting nonorgan-confined bladder disease before cystectomy. Further external validation in a large cohort was recommended as still necessary. Cancer 2009. © 2009 American Cancer Society. [source]