Home About us Contact | |||
Validation Data Set (validation + data_set)
Selected AbstractsResidual Autocorrelation Distribution in the Validation Data SetJOURNAL OF TIME SERIES ANALYSIS, Issue 2 2000Alessandro Fasso Testing model performance on a data set other than the data set used for estimation is common practice in econometrics, technological stochastic modelling and environmetrics. In this paper, using an ARMAX model, the asymptotic distribution of the residual autocorrelations in the validation data set is given and a ,2 test for overall residual incorrelation is considered. [source] Artificial neural networks as statistical tools in epidemiological studies: analysis of risk factors for early infant wheezePAEDIATRIC & PERINATAL EPIDEMIOLOGY, Issue 6 2004Andrea Sherriff Summary Artificial neural networks (ANNs) are being used increasingly for the prediction of clinical outcomes and classification of disease phenotypes. A lack of understanding of the statistical principles underlying ANNs has led to widespread misuse of these tools in the biomedical arena. In this paper, the authors compare the performance of ANNs with that of conventional linear logistic regression models in an epidemiological study of infant wheeze. Data on the putative risk factors for infant wheeze have been obtained from a sample of 7318 infants taking part in the Avon Longitudinal Study of Parents and Children (ALSPAC). The data were analysed using logistic regression models and ANNs, and performance based on misclassification rates of a validation data set were compared. Misclassification rates in the training data set decreased as the complexity of the ANN increased: h = 0: 17.9%; h = 2: 16.2%; h = 5: 14.9%, and h = 10: 9.2%. However, the more complex models did not generalise well to new data sets drawn from the same population: validation data set misclassification rates: h = 0: 17.9%; h = 2: 19.6%; h = 5: 20.2% and h = 10: 22.9%. There is no evidence from this study that ANNs outperform conventional methods of analysing epidemiological data. Increasing the complexity of the models serves only to overfit the model to the data. It is important that a validation or test data set is used to assess the performance of highly complex ANNs to avoid overfitting. [source] Ecological niche conservatism in North American freshwater fishesBIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY, Issue 2 2009KRISTINA M. MCNYSET There are many hypotheses of relationships, and also of speciation processes, in North American freshwater fishes, although, to date, there have been no direct tests of whether there is evidence of ecological niche conservatism. In the present study, ecological niche modeling is used to look for evidence of ecological niche conservatism in six clades of freshwater fishes: the starheaded topminnows, sand darters, black basses, Notropis rubellus species group, Notropis longirostris species group, and the Hybopsis amblops species group. This is achieved by evaluating the reciprocal predictivity of distributional predictions based on ecological niche models developed for each individual taxon in a clade under the assumption that high reciprical predictivity between sister species can be taken as evidence of niche conservatism. Omission percentages, total and average commission, and the area under the curve in a receiver operating characteristic analysis, where calculated, are used to evaluate predictive ability. Occurrence data for each species were subset into a training and independent validation data set where possible. Across all clades and species, models predicted the validation data for a given species well. Ecological niche conservatism was found generally across the starheaded topminnows, the sand darters, and the N. longirostris species group. There was some inter-predictivity within the N. rubellus group, but almost no inter-predictivity within the black basses, indicating a lack of conservatism. These results demonstrate that ecological niches generally act as stable constraints on freshwater fish distributions in North America. © 2009 The Linnean Society of London, Biological Journal of the Linnean Society, 2009, 96, 282,295. [source] Modeling kinetics of a large-scale fed-batch CHO cell culture by Markov chain Monte Carlo methodBIOTECHNOLOGY PROGRESS, Issue 1 2010Zizhuo Xing Abstract Markov chain Monte Carlo (MCMC) method was applied to model kinetics of a fed-batch Chinese hamster ovary cell culture process in 5,000-L bioreactors. The kinetic model consists of six differential equations, which describe dynamics of viable cell density and concentrations of glucose, glutamine, ammonia, lactate, and the antibody fusion protein B1 (B1). The kinetic model has 18 parameters, six of which were calculated from the cell culture data, whereas the other 12 were estimated from a training data set that comprised of seven cell culture runs using a MCMC method. The model was confirmed in two validation data sets that represented a perturbation of the cell culture condition. The agreement between the predicted and measured values of both validation data sets may indicate high reliability of the model estimates. The kinetic model uniquely incorporated the ammonia removal and the exponential function of B1 protein concentration. The model indicated that ammonia and lactate play critical roles in cell growth and that low concentrations of glucose (0.17 mM) and glutamine (0.09 mM) in the cell culture medium may help reduce ammonia and lactate production. The model demonstrated that 83% of the glucose consumed was used for cell maintenance during the late phase of the cell cultures, whereas the maintenance coefficient for glutamine was negligible. Finally, the kinetic model suggests that it is critical for B1 production to sustain a high number of viable cells. The MCMC methodology may be a useful tool for modeling kinetics of a fed-batch mammalian cell culture process. © 2009 American Institute of Chemical Engineers Biotechnol. Prog., 2010 [source] |