Training Samples (training + sample)

Distribution by Scientific Domains


Selected Abstracts


Assessment of Individual Risk of Death Using Self-Report Data: An Artificial Neural Network Compared with a Frailty Index

JOURNAL OF AMERICAN GERIATRICS SOCIETY, Issue 7 2004
Xiaowei Song PhD
Objectives: To evaluate the potential of an artificial neural network (ANN) in predicting survival in elderly Canadians, using self-report data. Design: Cohort study with up to 72 months follow-up. Setting: Forty self-reported characteristics were obtained from the community sample of the Canadian Study of Health and Aging. An individual frailty index score was calculated as the proportion of deficits experienced. For the ANN, randomly selected participants formed the training sample to derive relationships between the variables and survival and the validation sample to control overfitting. An ANN output was generated for each subject. A separate testing sample was used to evaluate the accuracy of prediction. Participants: A total of 8,547 Canadians aged 65 to 99, of whom 1,865 died during 72 months of follow-up. Measurements: The output of an ANN model was compared with an unweighted frailty index in predicting survival patterns using receiver operating characteristic (ROC) curves. Results: The area under the ROC curve was 86% for the ANN and 62% for the frailty index. At the optimal ROC value, the accuracy of the frailty index was 70.0%. The ANN accuracy rate over 10 simulations in predicting the probability of individual survival mean±standard deviation was 79.2±0.8%. Conclusion: An ANN provided more accurate survival classification than an unweighted frailty index. The data suggest that the concept of biological redundancy might be operationalized from health survey data. [source]


Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies

JOURNAL OF FORECASTING, Issue 6 2009
Wolfgang Härdle
Abstract In the era of Basel II a powerful tool for bankruptcy prognosis is vital for banks. The tool must be precise but also easily adaptable to the bank's objectives regarding the relation of false acceptances (Type I error) and false rejections (Type II error). We explore the suitability of smooth support vector machines (SSVM), and investigate how important factors such as the selection of appropriate accounting ratios (predictors), length of training period and structure of the training sample influence the precision of prediction. Moreover, we show that oversampling can be employed to control the trade-off between error types, and we compare SSVM with both logistic and discriminant analysis. Finally, we illustrate graphically how different models can be used jointly to support the decision-making process of loan officers. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Change-point monitoring in linear models

THE ECONOMETRICS JOURNAL, Issue 3 2006
Alexander Aue
Summary, We consider a linear regression model with errors modelled by martingale difference sequences, which include heteroskedastic augmented GARCH processes. We develop asymptotic theory for two monitoring schemes aimed at detecting a change in the regression parameters. The first method is based on the CUSUM of the residuals and was studied earlier in the context of independent identically distributed errors. The second method is new and is based on the squares of prediction errors. Both methods use a training sample of size m. We show that, as m,,, both methods have correct asymptotic size and detect a change with probability approaching unity. The methods are illustrated and compared in a small simulation study. [source]


Identifying Combinations of Cancer Markers for Further Study as Triggers of Early Intervention

BIOMETRICS, Issue 4 2000
Stuart G. Baker
Summary. In many long-term clinical trials or cohort studies, investigators repeatedly collect and store tissue or serum specimens and later test specimens from cancer cases and a random sample of controls for potential markers for cancer. An important question is what combination, if any, of the molecular markers should be studied in a future trial as a trigger for early intervention. To answer this question, we summarized the performance of various combinations using Receiver Operating Characteristic (ROC) curves, which plot true versus false positive rates. To construct the ROC curves, we proposed a new class of nonparametric algorithms which extends the ROC paradigm to multiple tests. We fit various combinations of markers to a training sample and evaluated the performance in a test sample using a target region based on a utility function. We applied the methodology to the following markers for prostate cancer, the last value of total prostate-specific antigen (PSA), the last ratio of total to free PSA, the last slope of total PSA, and the last slope of the ratio. In the test sample, the ROC curve for last total PSA was slightly closer to the target region than the ROC curve for a combination of four markers. In a separate validation sample, the ROC curve for last total PSA intersected the target region in 77% of bootstrap replications, indicating some promise for further study. We also discussed sample size calculations. [source]


Analysis of co-articulation regions for performance-driven facial animation

COMPUTER ANIMATION AND VIRTUAL WORLDS (PREV: JNL OF VISUALISATION & COMPUTER ANIMATION), Issue 1 2004
Douglas Fidaleo
Abstract A facial gesture analysis procedure is presented for the control of animated faces. Facial images are partitioned into a set of local, independently actuated regions of appearance change termed co-articulation regions (CRs). Each CR is parameterized by the activation level of a set of face gestures that affect the region. The activation of a CR is analyzed using independent component analysis (ICA) on a set of training images acquired from an actor. Gesture intensity classification is performed in ICA space by correlation to training samples. Correlation in ICA space proves to be an efficient and stable method for gesture intensity classification with limited training data. A discrete sample-based synthesis method is also presented. An artist creates an actor-independent reconstruction sample database that is indexed with CR state information analyzed in real time from video. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Feature Extraction for Traffic Incident Detection Using Wavelet Transform and Linear Discriminant Analysis

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 4 2000
A. Samant
To eliminate false alarms, an effective traffic incident detection algorithm must be able to extract incident-related features from the traffic patterns. A robust feature-extraction algorithm also helps reduce the dimension of the input space for a neural network model without any significant loss of related traffic information, resulting in a substantial reduction in the network size, the effect of random traffic fluctuations, the number of required training samples, and the computational resources required to train the neural network. This article presents an effective traffic feature-extraction model using discrete wavelet transform (DWT) and linear discriminant analysis (LDA). The DWT is first applied to raw traffic data, and the finest resolution coefficients representing the random fluctuations of traffic are discarded. Next, LDA is employed to the filtered signal for further feature extraction and reducing the dimensionality of the problem. The results of LDA are used as input to a neural network model for traffic incident detection. [source]


Spatial prediction of nitrate pollution in groundwaters using neural networks and GIS: an application to South Rhodope aquifer (Thrace, Greece)

HYDROLOGICAL PROCESSES, Issue 3 2009
Dr A. Gemitzi
Abstract Neural network techniques combined with Geographical Information Systems (GIS), are used in the spatial prediction of nitrate pollution in groundwaters. Initially, the most important parameters controlling groundwater pollution by nitrates are determined. These include hydraulic conductivity of the aquifer, depth to the aquifer, land uses, soil permeability, and fine to coarse grain ratio in the unsaturated zone. All these parameters were quantified in a GIS environment, and were standardized in a common scale. Subsequently, a neural network classification was applied, using a multi-layer perceptron classifier with the back propagation (BP) algorithm, in order to categorize the examined area into categories of groundwater nitrate pollution potential. The methodology was applied to South Rhodope aquifer (Thrace, Greece). The calculation was based on information from 214 training sites, which correspond to monitored nitrate concentrations in groundwaters in the area. The predictive accuracy of the model developed reached 86% in the training samples, 74% in the overall sample and 71% in the test samples. This indicates that this methodology is promising to describe the spatial pattern of nitrate pollution. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Explaining qualifications in audit reports using a support vector machine methodology

INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 4 2005
Michael Doumpos
The verification of whether the financial statements of a firm represent its actual position is of major importance for auditors, who should provide a qualified report if they conclude that the financial statements fail to meet this requirement. This paper implements support vector machines (SVMs) to develop models that may support auditors in this task. Linear and non-linear models are developed and their performance is analysed using training samples of different size and out-of-sample/out-of-time data. The results show that all SVM models are capable of distinguishing between qualified and unqualified financial statements with satisfactory accuracy. The performance of the models over time is also explored. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Adaptive multiobjective optimization of process conditions for injection molding using a Gaussian process approach

ADVANCES IN POLYMER TECHNOLOGY, Issue 2 2007
Jian Zhou
Abstract Selecting the proper process conditions for the injection-molding process is treated as a multiobjective optimization problem, where different objectives, such as minimizing the injection pressure, volumetric shrinkage/warpage, or cycle time, present trade-off behaviors. As such, various optima may exist in the objective space. This paper presents the development of an integrated simulation-based optimization system that incorporates the design of computer experiments, Gaussian process (GP) for regression, multiobjective genetic algorithm (MOGA), and levels of adjacency to adaptively and automatically search for the Pareto-optimal solutions for different objectives. Since the GP approach can provide both the predictions and the estimations of the predictions simultaneously, a nondominated sorting procedure on the predicted variances at each iteration step is performed to intelligently select extra samples that can be used as additional training samples to improve the GP surrogate models. At the same time, user-defined adjacency constraint percentages are employed for evaluating the convergence of iteration. The illustrative applications in this paper show that the proposed optimization system can help mold designers to efficiently and effectively identify optimal process conditions. © 2007 Wiley Periodicals, Inc. Adv Polym Techn 26:71,85, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/adv.20092 [source]


Process optimization of injection molding using an adaptive surrogate model with Gaussian process approach

POLYMER ENGINEERING & SCIENCE, Issue 5 2007
Jian Zhou
This article presents an integrated, simulation-based optimization procedure that can determine the optimal process conditions for injection molding without user intervention. The idea is to use a nonlinear statistical regression technique and design of computer experiments to establish an adaptive surrogate model with short turn-around time and adequate accuracy for substituting time-consuming computer simulations during system-level optimization. A special surrogate model based on the Gaussian process (GP) approach, which has not been employed previously for injection molding optimization, is introduced. GP is capable of giving both a prediction and an estimate of the confidence (variance) for the prediction simultaneously, thus providing direction as to where additional training samples could be added to improve the surrogate model. While the surrogate model is being established, a hybrid genetic algorithm is employed to evaluate the model to search for the global optimal solutions in a concurrent fashion. The examples presented in this article show that the proposed adaptive optimization procedure helps engineers determine the optimal process conditions more efficiently and effectively. POLYM. ENG. SCI., 47:684,694, 2007. © 2007 Society of Plastics Engineers. [source]