Cross-validation

Distribution by Scientific Domains

Kinds of Cross-validation

  • generalized cross-validation
  • leave-one-out cross-validation

  • Terms modified by Cross-validation

  • cross-validation analysis
  • cross-validation procedure
  • cross-validation study
  • cross-validation test

  • Selected Abstracts


    Cross-Validation and Discriminant Validity of Adolescent Health Promotion Scale Among Overweight and Nonoverweight Adolescents in Taiwan

    PUBLIC HEALTH NURSING, Issue 6 2006
    Mei-Yen Chen
    ABSTRACT This study used cross-validation and discriminant analysis to evaluate the construct and discriminant validity of Adolescent Health Promotion (AHP) scale between the overweight and nonoverweight adolescents in Taiwan. A cross-sectional survey method was used and 660 adolescents participated in this study. Cluster and discriminant analyses were used to analyze the data. Our findings indicate that the AHP is a valid and reliable scale to discriminate between the health-promoting behaviors of overweight and nonoverweight adolescents. For the total scale, cluster analyses revealed two distinct patterns, which we designated the healthy and unhealthy groups. Discriminate analysis supported this clustering as having good discriminant validity, as nonoverweight adolescents tended to be classified as healthy, while the overweight tended to be in the unhealthy group. In general, overweight adolescents practiced health-related behaviors at a significantly lower frequency than the nonoverweight. These included exercise behavior, stress management, life appreciation, health responsibility, and social support. These findings can be used to further develop and refine knowledge of adolescent overweight and related strategies for intervention. [source]


    Interpolation processes using multivariate geostatistics for mapping of climatological precipitation mean in the Sannio Mountains (southern Italy)

    EARTH SURFACE PROCESSES AND LANDFORMS, Issue 3 2005
    Nazzareno Diodato
    Abstract The spatial variability of precipitation has often been a topic of research, since accurate modelling of precipitation is a crucial condition for obtaining reliable results in hydrology and geomorphology. In mountainous areas, the sparsity of the measurement networks makes an accurate and reliable spatialization of rainfall amounts at the local scale difficult. The purpose of this paper is to show how the use of a digital elevation model can improve interpolation processes at the subregional scale for mapping the mean annual and monthly precipitation from rainfall observations (40 years) recorded in a region of 1400 km2 in southern Italy. Besides linear regression of precipitation against elevation, two methods of interpolation are applied: inverse squared distance and ordinary cokriging. Cross-validation indicates that the inverse distance interpolation, which ignores the information on elevation, yields the largest prediction errors. Smaller prediction errors are produced by linear regression and ordinary cokriging. However, the results seem to favour the multivariate geostatistical method including auxiliary information (related to elevation). We conclude that ordinary cokriging is a very flexible and robust interpolation method because it can take into account several properties of the landscape; it should therefore be applicable in other mountainous regions, especially where precipitation is an important geomorphological factor. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    Factorial structure and cross-cultural invariance of the Oral Impacts on Daily Performances

    EUROPEAN JOURNAL OF ORAL SCIENCES, Issue 3 2009
    A. N. Ĺstrřm
    The issue of cross-cultural construct validation and measurement invariance of the Oral Impacts on Daily Performances (OIDP) questionnaire is important. Using confirmatory factor analysis (CFA), this study evaluated a proposed three-factor structure of the OIDP questionnaire in Tanzanian adolescents and adults and assessed whether this model would be replicated in Ugandan adolescents. Between 2004 and 2007, OIDP data were collected from 1,601 Tanzanian adolescents, 1,031 Tanzanian adults, and 1,146 Ugandan adolescents. Model generation analysis was restricted to Tanzanian adolescents, and the model achieved was tested, without modification, in Tanzanian adults and in Ugandan adolescents. A modified three-factor solution with cross-loadings improved the fit of the OIDP model to the data compared with a one-factor model and the original three-factor model within the Tanzanian [comparative fit index (CFI) = 0.99] and Ugandan (CFI = 0.98) samples. Cross-validation in Tanzanian adults provided a reasonable fit (CFI = 0.98). Multiple-group CFA demonstrated acceptable fit [,2 = 140.829, degrees of freedom (d.f.) = 24, CFI = 0.98] for the unconstrained model, whereas unconstrained and constrained models were statistically significantly different. Factorial validity was confirmed for the three-factor OIDP model. The results provide evidence for cross-cultural equivalence of the OIDP, suggesting that this measure is comparable, at least to some extent, across Tanzanian and Ugandan adolescents. [source]


    The Prevalence of Lying in America: Three Studies of Self-Reported Lies

    HUMAN COMMUNICATION RESEARCH, Issue 1 2010
    Kim B. Serota
    This study addresses the frequency and the distribution of reported lying in the adult population. A national survey asked 1,000 U.S. adults to report the number of lies told in a 24-hour period. Sixty percent of subjects report telling no lies at all, and almost half of all lies are told by only 5% of subjects; thus, prevalence varies widely and most reported lies are told by a few prolific liars. The pattern is replicated in a reanalysis of previously published research and with a student sample. Substantial individual differences in lying behavior have implications for the generality of truth,lie base rates in deception detection experiments. Explanations concerning the nature of lying and methods for detecting lies need to account for this variation. L'importance du mensonge aux États-Unis : trois études de mensonges auto-déclarés Kim B. Serota, Timothy R. Levine, Franklin J. Boster Cette étude aborde la fréquence et la distribution des mensonges déclarés par la population adulte. Un sondage national a demandéŕ 1 000 adultes américains de déclarer le nombre de mensonges racontés dans une période de 24 heures. 60 % des sujets ont rapporté ne pas avoir dit de mensonge du tout et prčs de la moitié de ces mensonges sont racontés par 5 % des sujets. L'importance des mensonges varie donc largement et la plupart des mensonges déclarés sont formulés par un petit nombre de menteurs prolifiques. Cette tendance se retrouve également dans une nouvelle analyse de recherches déjŕ publiées et dans l'échantillon étudiant. Des différences individuelles importantes dans les comportements mensongers ont également des conséquences pour la généralité d'un taux de référence vérité,mensonge dans les expériences de détection de la tromperie. Les explications concernant la nature du mensonge et les méthodes de détection de mensonges doivent prendre en compte cette variation. Mots clés : tromperie, mensonge, différences individuelles The Prevalence of Lying in America: Three Studies of Self-Reported Lies Research Question: This study addresses the frequency and the distribution of reported lying in the adult population. Significance: In the deception literature, consensus is that most people lie on a daily basis. Yet this view is founded on very little empirical evidence. This research tests the question of lying prevalence. Method: Survey research techniques and descriptive analysis are used to establish base rates and frequency distributions for reported lying behavior. Data source: A national survey asked 1,000 U.S. adults to report the number of lies told in a 24 hour period. Cross-validation is provided by re-analysis of previously reported diary and experimental data and by replication using a sample of 225 students. Findings: The oft-repeated average (arithmetic mean) of one to two lies per day is replicated but the study finds the distribution is highly skewed. On a typical day, 60% of subjects report telling no lies at all, and almost half or all lies are told by only 5% of subjects; thus, prevalence varies widely and most reported lies are told by a few prolific liars. The pattern is replicated in the re-analysis of previously published research and with the student sample. Implications: The findings of a highly skewed distribution render the average number of lies per day misleading. Substantial individual differences in lying behavior also have implications for the generality of truth-lie base-rates in deception detection experiments. Explanations concerning the nature of lying and methods for detecting lies need to account for this variation. Keywords: deception, lies, lying, communication, individual differences Die Prävalenz von Lügen in Amerika. Drei Studien zu selbstberichteten Lügen Forschungsfrage: Diese Studie untersucht die Häufigkeit und Verteilung von Lügen in der erwachsenen Bevölkerung. Zentralität: In der Literatur zu Täuschung besteht Konsens darüber, dass Menschen täglich Lügen. Allerdings basiert diese Feststellung auf wenigen empirischen Daten. Diese Untersuchung testet die Frage nach der Prävalenz von Lügen. Methode: Umfrage und beschreibende Analyse wurden angewandt, um eine Basisrate und Häufigkeitsdistribution für selbstberichtetes Lügenverhalten zu gewinnen. Datenquelle: In einer nationalen Umfrage wurden 1.000 US-amerikanische Erwachsene zur Zahl der Lügen befragt, die sie in 24 Stunden erzählten. Validiert wurden diese Aussagen durch eine erneute Analyse von bereits dokumentierten Tagebuchdaten und Experimentaldaten und durch die Replikation mit einer Stichprobe von 225 Studierenden. Ergebnisse: Der oft wiederholte Durchschnitt (arithmetische Mittel) von ein bis zwei Lügen pro Tag wurde in der Studie repliziert, allerdings zeigte sich auch, dass diese Verteilung schief ist. 60% der Befragten gaben an, an einem typischen Tag keine Lügen zu erzählen, fast die Hälfte aller Lügen wird von nur 5% der Befragten erzählt; die Prävalenz variiert stark und die meisten der berichteten Lügen werden durch wenige produktive Lügner erzählt. Dieses Muster wurde bei einer erneuten Analyse von vorher publizierten Daten und in der Studentenstichprobe repliziert. Implikationen: Die Ergebnisse dieser stark schiefen Verteilung zeigen, dass die durchschnittliche Zahl von Lügen pro Tag irreführend ist. Substantielle individuelle Unterschiede im Lügenverhalten haben Implikationen für die Verallgemeinerbarkeit von Wahrheit-Lüge-Basisraten in Täuschungserfassungs-Experimenten. Erklärungen zur Natur von Lügen und Methoden zur Erfassung von Lügen müssen diese Variation bedenken. Schlüsselworte: Täuschung, Lüge, Lügen, Kommunikation, individuelle Unterschiede La Prevalencia de la Mentira en América: Tres Estudios de Auto-reportes de Mentiras Kim B. Serota, Timothy R. Levine, Franklin J. Boster Michigan State University The authors thank Deborah Kashy Resumen Este estudio trata sobre la frecuencia y la distribución de los reportes de las mentiras de la población adulta. Una encuesta nacional preguntó a 1,000 adultos Estadounidenses que reporten el número de mentiras contadas en un período de 24-horas. 60% de los sujetos reportaron que no dicen mentiras para nada, y casi la mitad son contadas por solo un 5% de los sujetos; así, la prevalencia varía enormemente y muchos reportaron que las mentiras son contadas por pocos mentirosos prolíficos. La pauta es replicada en el re-análisis de investigación previamente publicada y con una muestra de estudiantes. Las diferencias individuales sustanciales en el comportamiento mentiroso tienen implicaciones también para la generalidad del índice basado en la verdad-mentira en los experimentos de detección de decepción. Las explicaciones concernientes a la naturaleza de la mentira y los métodos de detección de mentiras necesitan responder a esta variación. Palabras Claves: decepción, mentiras, mentir, comunicación, diferencias individuales [source]


    Childhood trauma has dose-effect relationship with dropping out from psychotherapeutic treatment for bulimia nervosa: A replication

    INTERNATIONAL JOURNAL OF EATING DISORDERS, Issue 2 2001
    Jennifer Mahon
    Abstract Objective The primary goal of this study was to replicate the finding that experiences of childhood trauma have a dose-effect relationship with dropping out from psychotherapeutic treatment for bulimia nervosa. It also aimed to replicate logistic regression findings that parental break-up predicts dropping out. Method The cohort consisted of 114 women consecutively presenting to an outpatient eating disorders clinic with bulimia nervosa or atypical bulimia nervosa. Data were gathered using a retrospective, case-note approach and were analysed using logistic regression (LR). A correlation technique was employed to assess the presence of a dose-effect relationship between experiences of trauma in childhood and dropping out. LR models were double cross-validated between this and an earlier cohort. Results The dose-effect relationship between experiences of childhood trauma and dropping out was confirmed. Witnessing parental break-up in childhood again predicted dropping out of treatment in adulthood. Cross-validation of LR equations was unsuccessful. Discussion These results strongly suggest that experiences of childhood trauma have a dose-effect relationship with dropping out. Parental break-up is a stable predictor of dropping out. It is possible that these experiences influence attachment style, particularly the ability to make and maintain a trusting relationship with a psychotherapist. Clinical implications are discussed. © 2001 by John Wiley & Sons, Inc. Int J Eat Disord 30: 138,148, 2001. [source]


    Multi-block and path modelling procedures

    JOURNAL OF CHEMOMETRICS, Issue 11-12 2008
    Agnar Höskuldsson
    Abstract The author has developed a unified theory of path and multi-block modelling of data. The data blocks are arranged in a directional path. Each data block can lead to one or more data blocks. It is assumed that there is given a collection of input data blocks. Each of them is supposed to describe one or more intermediate data blocks. The output data blocks are those that are at the ends of the paths and have no succeeding data blocks. The optimisation procedure finds weights for the input data blocks so that the size of the total loadings for the output data blocks are maximised. When the optimal weight vectors have been determined, the score and loading vectors for the data blocks in the path are determined. Appropriate adjustment of the data blocks is carried out at each step. Regression coefficients are computed for each data block that show how the data block is estimated by data blocks that lead to it. Methods of standard regression analysis are extended to this type of modelling. Three types of ,strengths' of relationship are computed for each set of two connected data blocks. First is the strength in the path, second the strength where only the data blocks leading to the last one are used and third if only the two are considered. Cross-validation and other standard methods of linear regression are carried out in a similar manner. In industry, processes are organised in different ways. It can be useful to model the processes in the way they are carried out. By proper alignment of sub-processes, overall model can be specified. There can be several useful path models during the process, where the data blocks in a path are the ones that are actual or important at given stages of the process. Data collection equipments are getting more and more advanced and cheap. Data analysis need to ,catch up' with the challenges that these new technology provides with. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    Cross-validation of neural network applications for automatic new topic identification

    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 3 2008
    H. Cenk Ozmutlu
    The purpose of this study is to provide results from experiments designed to investigate the cross-validation of an artificial neural network application to automatically identify topic changes in Web search engine user sessions by using data logs of different Web search engines for training and testing the neural network. Sample data logs from the FAST and Excite search engines are used in this study. The results of the study show that identification of topic shifts and continuations on a particular Web search engine user session can be achieved with neural networks that are trained on a different Web search engine data log. Although FAST and Excite search engine users differ with respect to some user characteristics (e.g., number of queries per session, number of topics per session), the results of this study demonstrate that both search engine users display similar characteristics as they shift from one topic to another during a single search session. The key finding of this study is that a neural network that is trained on a selected data log could be universal; that is, it can be applicable on all Web search engine transaction logs regardless of the source of the training data log. [source]


    Linking the concept of scale to studies of biological diversity: evolving approaches and tools

    DIVERSITY AND DISTRIBUTIONS, Issue 3 2006
    Erik A. Beever
    ABSTRACT Although the concepts of scale and biological diversity independently have received rapidly increasing attention in the scientific literature since the 1980s, the rate at which the two concepts have been investigated jointly has grown much more slowly. We find that scale considerations have been incorporated explicitly into six broad areas of investigation related to biological diversity: (1) heterogeneity within and among ecosystems, (2) disturbance ecology, (3) conservation and restoration, (4) invasion biology, (5) importance of temporal scale for understanding processes, and (6) species responses to environmental heterogeneity. In addition to placing the papers of this Special Feature within the context of brief summaries of the expanding literature on these six topics, we provide an overview of tools useful for integrating scale considerations into studies of biological diversity. Such tools include hierarchical and structural-equation modelling, kriging, variable-width buffers, k -fold cross-validation, and cascading graph diagrams, among others. Finally, we address some of the major challenges and research frontiers that remain, and conclude with a look to the future. [source]


    Determination of ethyl sulfate , a marker for recent ethanol consumption , in human urine by CE with indirect UV detection

    ELECTROPHORESIS, Issue 23 2006
    Francesc A. Esteve-Turrillas
    Abstract A CE method for the determination of the ethanol consumption marker ethyl sulfate,(EtS) in human urine was developed. Analysis was performed in negative polarity mode with a background electrolyte composed of 15,mM maleic acid, 1,mM phthalic acid, and 0.05,mM cetyltrimethylammonium bromide (CTAB) at pH,2.5 and indirect UV detection at 220,nm (300,nm reference wavelength). This buffer system provided selective separation conditions for EtS and vinylsulfonic acid, employed as internal standard, from urine matrix components. Sample pretreatment of urine was minimized to a 1:5 dilution with water. The optimized CE method was validated in the range of 5,700,mg/L using seven lots of urine. Intra- and inter-day precision and accuracy values, determined at 5, 60, and 700,mg/L with each lot of urine, fulfilled the requirements according to common guidelines for bioanalytical method validation. The application to forensic urine samples collected at autopsies as well as a successful cross-validation with a LC-MS/MS-based method confirmed the overall validity and real-world suitability of the developed expeditious CE assay (sample throughput 130 per day). [source]


    Forecasting daily high ozone concentrations by classification trees

    ENVIRONMETRICS, Issue 2 2004
    F. Bruno
    Abstract This article proposes the use of classification trees (CART) as a suitable technique for forecasting the daily exceedance of ozone standards established by Italian law. A model is formulated for predicting, 1 and 2 days beforehand, the most probable class of the maximum daily urban ozone concentration in the city of Bologna. The standard employed is the so-called ,warning level' (180,,g/m3). Meteorological forecasted variables are considered as predictors. Pollution data show a considerable discrepancy between the dimensions of the two classes of events. The first class includes those days when the observed maximum value exceeds the established standard, while the second class contains those when the observed maximum value does not exceed the said standard. Due to this peculiarity, model selection procedures using cross-validation usually lead to overpruning. We can overcome this drawback by means of techniques which replicate observations, through the modification of their inclusion probabilities in the cross-validation sets. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    Validity of Three Measures of Health-related Quality of Life in Children with Intractable Epilepsy

    EPILEPSIA, Issue 10 2002
    Elisabeth M. S. Sherman
    Summary: ,Purpose: Validity studies on health-related quality of life (HRQOL) scales for pediatric epilepsy are few, and cross-validation with other samples has not been reported. This study was designed to assess the validity of three parent-rated measures of HRQOL in pediatric epilepsy: (a) the Impact of Childhood Illness Scale (ICI), (b) the Impact of Child Neurologic Handicap Scale (ICNH), and (c) the Hague Restrictions in Epilepsy Scale (HARCES). Methods: Retrospective data were examined for 44 children with intractable epilepsy. Validity was assessed by evaluating differences across epilepsy severity groups as well as correlations between HRQOL scales and neurologic variables (seizure severity, epilepsy duration, current/prior antiepileptic medications) and psychosocial measures (emotional functioning, IQ, social skills, adaptive behavior). Scale overlap with a global QOL rating also was assessed. Results: The HRQOL measures were moderately to highly intercorrelated. The scales differed in terms of their associations with criterion measures. The HARCES was related to the highest number of neurologic variables and the ICNH to the fewest. All three scales were related to psychosocial functioning and to global quality of life. Conclusions: The results of this study suggest that the three measures are likely adequate measures of HRQOL for use in intractable childhood epilepsy. The measures were highly intercorrelated, and they were all broadly related to criterion measures reflecting specific domains of HRQOL as well as global QOL. Some differences between scales emerged, however, that suggest care in choosing HRQOL instruments for children with epilepsy. [source]


    Comparison of three models of alcohol craving in young adults: a cross-validation

    ADDICTION, Issue 4 2004
    Peter M. McEvoy
    ABSTRACT Aims The aim of study 1 was to develop a three-factor Approach and Avoidance of Alcohol Questionnaire (AAAQ), designed to assess mild and intense inclinations to drink, as well as inclinations to avoid drinking. The aims of study 2 were to cross-validate the AAAQ with an independent sample and to test the goodness-of-fit of three models of craving for alcohol: (a) the traditional unidimensional model; (b) a two-dimensional, approach,avoidance ambivalence model; and (c) an expanded two-dimensional neuroanatomical model that retains avoidance, while positing a threshold that partitions approach into two distinct levels and relates all three factors involved in craving to brain pathways associated with inhibitory processes, reward and obsessive,compulsive behaviour, respectively. Design, setting and participants The survey was administered to 589 Australian university students (69% women) in study 1 and to 523 American university students (64% women) in study 2. Measurements Inclinations to drink and to not drink (AAAQ), drinking behaviour (quantity and frequency), drinking problems (Young Adult Alcohol Problems Screening Test; YAAPST) and readiness for change (Stages of Change Readiness and Treatment Eagerness Scale; SOCRATES). Findings The expanded two-dimensional neuroanatomical model provided the best fit to the data. The AAAQ explained a substantial proportion of the variance in drinking frequency (41,53%), drinking quantity (49,60%) and drinking problems (43%). AAAQ profiles differed as a function of drinking-related risk, and the three AAAQ scales differentially predicted readiness for change. Conclusions Approach and avoidance inclinations toward alcohol are separable constructs, and their activation may not be invariably reciprocal. Craving can be defined as the relative activation of substance-related response inclinations along these two primary dimensions. There may be a threshold of intensity that separates mild from intense approach inclinations. [source]


    Evaluating cluster analysis solutions: an application to the Italian NEO personality inventory

    EUROPEAN JOURNAL OF PERSONALITY, Issue S1 2002
    Claudio Barbaranelli
    This paper is concerned with the evaluation of cluster analysis solutions. Internal criteria and replication issues are compared and applied to empirical data collected from an Italian sample of 421 young adults, using the NEO Personality Inventory. The following internal criteria were considered: C, gamma, and G(,+,) indices, and point-biserial correlation. Replication was examined (i) ,internally' using double cross-validation and bootstrap approaches and (ii) ,externally' by comparing the solution obtained on the Italian sample with the results obtained in German and Spanish samples. While replication analyses supported three- and four-cluster solutions, internal criteria (with the exception of point-biserial correlation) tended to privilege solutions with a much larger number of groups. Advantages and limitations of the different strategies are discussed. Copyright © 2002 John Wiley & Sons, Ltd. [source]


    Predicting pasture root density from soil spectral reflectance: field measurement

    EUROPEAN JOURNAL OF SOIL SCIENCE, Issue 1 2010
    B. H. KUSUMO
    This paper reports the development and evaluation of a field technique for in situ measurement of root density using a portable spectroradiometer. The technique was evaluated at two sites in permanent pasture on contrasting soils (an Allophanic and a Fluvial Recent soil) in the Manawatu region, New Zealand. Using a modified soil probe, reflectance spectra (350,2500 nm) were acquired from horizontal surfaces at three depths (15, 30 and 60 mm) of an 80-mm diameter soil core, totalling 108 samples for both soils. After scanning, 3-mm soil slices were taken at each depth for root density measurement and soil carbon (C) and nitrogen (N) analysis. The two soils exhibited a wide range of root densities from 1.53 to 37.03 mg dry root g,1 soil. The average root density in the Fluvial soil (13.21 mg g,1) was twice that in the Allophanic soil (6.88 mg g,1). Calibration models, developed using partial least squares regression (PLSR) of the first derivative spectra and reference data, were able to predict root density on unknown samples using a leave-one-out cross-validation procedure. The root density predictions were more accurate when the samples from the two soil types were separated (rather than grouped) to give sub-populations (n = 54) of spectral data with more similar attributes. A better prediction of root density was achieved in the Allophanic soil (r2 = 0.83, ratio prediction to deviation (RPD ) = 2.44, root mean square error of cross-validation (RMSECV ) = 1.96 mg g ,1) than in the Fluvial soil (r2 = 0.75, RPD = 1.98, RMSECV = 5.11 mg g ,1). It is concluded that pasture root density can be predicted from soil reflectance spectra acquired from field soil cores. Improved PLSR models for predicting field root density can be produced by selecting calibration data from field data sources with similar spectral attributes to the validation set. Root density and soil C content can be predicted independently, which could be particularly useful in studies examining potential rates of soil organic matter change. [source]


    Multivariate calibration of hyperspectral ,-ray energy spectra for proximal soil sensing

    EUROPEAN JOURNAL OF SOIL SCIENCE, Issue 1 2007
    R. A. Viscarra Rossel
    Summary The development of proximal soil sensors to collect fine-scale soil information for environmental monitoring, modelling and precision agriculture is vital. Conventional soil sampling and laboratory analyses are time-consuming and expensive. In this paper we look at the possibility of calibrating hyperspectral ,-ray energy spectra to predict various surface and subsurface soil properties. The spectra were collected with a proximal, on-the-go ,-ray spectrometer. We surveyed two geographically and physiographically different fields in New South Wales, Australia, and collected hyperspectral ,-ray data consisting of 256 energy bands at more than 20 000 sites in each field. Bootstrap aggregation with partial least squares regression (or bagging-PLSR) was used to calibrate the ,-ray spectra of each field for predictions of selected soil properties. However, significant amounts of pre-processing were necessary to expose the correlations between the ,-ray spectra and the soil data. We first filtered the spectra spatially using local kriging, then further de-noised, normalized and detrended them. The resulting bagging-PLSR models of each field were tested using leave-one-out cross-validation. Bagging-PLSR provided robust predictions of clay, coarse sand and Fe contents in the 0,15 cm soil layer and pH and coarse sand contents in the 15,50 cm soil layer. Furthermore, bagging-PLSR provided us with a measure of the uncertainty of predictions. This study is apparently the first to use a multivariate calibration technique with on-the-go proximal ,-ray spectrometry. Proximally sensed ,-ray spectrometry proved to be a useful tool for predicting soil properties in different soil landscapes. [source]


    Recent studies of dental development in Neandertals: Implications for Neandertal life histories

    EVOLUTIONARY ANTHROPOLOGY, Issue 1 2009
    Debbie Guatelli-Steinberg
    Abstract Did Neandertals share with modern humans their prolonged periods of growth and delayed ages of maturation? During the past five years, renewed interest in this question has produced dental studies with seemingly contradictory results. Some suggest fast dental growth,1, 2 while others appear to suggest a slower, modern-human dental growth pattern.3, 4 Although some apparent contradictions can be reconciled, there remain questions that can be resolved only with additional data and cross-validation of methods. Moreover, several difficulties are inherent in using dental development to gauge Neandertal life histories. Even with complete data on Neandertal dental development, questions are likely to remain about the meaning of those data with regard to understanding Neandertal life histories. [source]


    Seed-based systematic discovery of specific transcription factor target genes

    FEBS JOURNAL, Issue 12 2008
    Ralf Mrowka
    Reliable prediction of specific transcription factor target genes is a major challenge in systems biology and functional genomics. Current sequence-based methods yield many false predictions, due to the short and degenerated DNA-binding motifs. Here, we describe a new systematic genome-wide approach, the seed-distribution-distance method, that searches large-scale genome-wide expression data for genes that are similarly expressed as known targets. This method is used to identify genes that are likely targets, allowing sequence-based methods to focus on a subset of genes, giving rise to fewer false-positive predictions. We show by cross-validation that this method is robust in recovering specific target genes. Furthermore, this method identifies genes with typical functions and binding motifs of the seed. The method is illustrated by predicting novel targets of the transcription factor nuclear factor kappaB (NF-,B). Among the new targets is optineurin, which plays a key role in the pathogenesis of acquired blindness caused by adult-onset primary open-angle glaucoma. We show experimentally that the optineurin gene and other predicted genes are targets of NF-,B. Thus, our data provide a missing link in the signalling of NF-,B and the damping function of optineurin in signalling feedback of NF-,B. We present a robust and reliable method to enhance the genome-wide prediction of specific transcription factor target genes that exploits the vast amount of expression information available in public databases today. [source]


    Spatial and temporal patterns of walleye pollock (Theragra chalcogramma) spawning in the eastern Bering Sea inferred from egg and larval distributions

    FISHERIES OCEANOGRAPHY, Issue 2 2010
    NATHAN M. BACHELER
    Abstract Walleye pollock Theragra chalcogramma (pollock hereafter) is a key ecological and economic species in the eastern Bering Sea, yet detailed synthesis of the spatial and temporal patterns of pollock ichthyoplankton in this important region is lacking. This knowledge gap is particularly severe considering that egg and larval distribution are essential to reconstructing spawning locations and early life stages drift pathways. We used 19 yr of ichthyoplankton collections to determine the spatial and temporal patterns of egg and larval distribution. Generalized additive models (GAMs) identified two primary temporal pulses of pollock eggs, the first occurring from 20 February to 31 March and the second from 20 April to 20 May; larvae showed similar, but slightly lagged, pulses. Based on generalized cross-validation and information theory, a GAM model that allowed for different seasonal patterns in egg density within three unique areas outperformed a GAM that assumed a single fixed seasonal pattern across the entire eastern Bering Sea. This ,area-dependent' GAM predicted the highest densities of eggs (i.e., potential spawning locations) in three major areas of the eastern Bering Sea: near Bogoslof Island (February,April), north of Unimak Island and the Alaska Peninsula (March,April), and around the Pribilof Islands (April,August). Unique temporal patterns of egg density were observed for each area, suggesting that pollock spawning may be more spatially and temporally complex than previously assumed. Moreover, this work provides a valuable baseline of pollock spawning to which future changes, such as those resulting from climate variability, may be compared. [source]


    Iterative generalized cross-validation for fusing heteroscedastic data of inverse ill-posed problems

    GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 1 2009
    Peiliang Xu
    SUMMARY The method of generalized cross-validation (GCV) has been widely used to determine the regularization parameter, because the criterion minimizes the average predicted residuals of measured data and depends solely on data. The data-driven advantage is valid only if the variance,covariance matrix of the data can be represented as the product of a given positive definite matrix and a scalar unknown noise variance. In practice, important geophysical inverse ill-posed problems have often been solved by combining different types of data. The stochastic model of measurements in this case contains a number of different unknown variance components. Although the weighting factors, or equivalently the variance components, have been shown to significantly affect joint inversion results of geophysical ill-posed problems, they have been either assumed to be known or empirically chosen. No solid statistical foundation is available yet to correctly determine the weighting factors of different types of data in joint geophysical inversion. We extend the GCV method to accommodate both the regularization parameter and the variance components. The extended version of GCV essentially consists of two steps, one to estimate the variance components by fixing the regularization parameter and the other to determine the regularization parameter by using the GCV method and by fixing the variance components. We simulate two examples: a purely mathematical integral equation of the first kind modified from the first example of Phillips (1962) and a typical geophysical example of downward continuation to recover the gravity anomalies on the surface of the Earth from satellite measurements. Based on the two simulated examples, we extensively compare the iterative GCV method with existing methods, which have shown that the method works well to correctly recover the unknown variance components and determine the regularization parameter. In other words, our method lets data speak for themselves, decide the correct weighting factors of different types of geophysical data, and determine the regularization parameter. In addition, we derive an unbiased estimator of the noise variance by correcting the biases of the regularized residuals. A simplified formula to save the time of computation is also given. The two new estimators of the noise variance are compared with six existing methods through numerical simulations. The simulation results have shown that the two new estimators perform as well as Wahba's estimator for highly ill-posed problems and outperform any existing methods for moderately ill-posed problems. [source]


    A comparison of automatic techniques for estimating the regularization parameter in non-linear inverse problems

    GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 3 2004
    Colin G. Farquharson
    SUMMARY Two automatic ways of estimating the regularization parameter in underdetermined, minimum-structure-type solutions to non-linear inverse problems are compared: the generalized cross-validation and L-curve criteria. Both criteria provide a means of estimating the regularization parameter when only the relative sizes of the measurement uncertainties in a set of observations are known. The criteria, which are established components of linear inverse theory, are applied to the linearized inverse problem at each iteration in a typical iterative, linearized solution to the non-linear problem. The particular inverse problem considered here is the simultaneous inversion of electromagnetic loop,loop data for 1-D models of both electrical conductivity and magnetic susceptibility. The performance of each criteria is illustrated with inversions of a variety of synthetic and field data sets. In the great majority of examples tested, both criteria successfully determined suitable values of the regularization parameter, and hence credible models of the subsurface. [source]


    The contributions of topoclimate and land cover to species distributions and abundance: fine-resolution tests for a mountain butterfly fauna

    GLOBAL ECOLOGY, Issue 2 2010
    Javier Gutiérrez Illán
    ABSTRACT Aim, Models relating species distributions to climate or habitat are widely used to predict the effects of global change on biodiversity. Most such approaches assume that climate governs coarse-scale species ranges, whereas habitat limits fine-scale distributions. We tested the influence of topoclimate and land cover on butterfly distributions and abundance in a mountain range, where climate may vary as markedly at a fine scale as land cover. Location, Sierra de Guadarrama (Spain, southern Europe) Methods, We sampled the butterfly fauna of 180 locations (89 in 2004, 91 in 2005) in a 10,800 km2 region, and derived generalized linear models (GLMs) for species occurrence and abundance based on topoclimatic (elevation and insolation) or habitat (land cover, geology and hydrology) variables sampled at 100-m resolution using GIS. Models for each year were tested against independent data from the alternate year, using the area under the receiver operating characteristic curve (AUC) (distribution) or Spearman's rank correlation coefficient (rs) (abundance). Results, In independent model tests, 74% of occurrence models achieved AUCs of > 0.7, and 85% of abundance models were significantly related to observed abundance. Topoclimatic models outperformed models based purely on land cover in 72% of occurrence models and 66% of abundance models. Including both types of variables often explained most variation in model calibration, but did not significantly improve model cross-validation relative to topoclimatic models. Hierarchical partitioning analysis confirmed the overriding effect of topoclimatic factors on species distributions, with the exception of several species for which the importance of land cover was confirmed. Main conclusions, Topoclimatic factors may dominate fine-resolution species distributions in mountain ranges where climate conditions vary markedly over short distances and large areas of natural habitat remain. Climate change is likely to be a key driver of species distributions in such systems and could have important effects on biodiversity. However, continued habitat protection may be vital to facilitate range shifts in response to climate change. [source]


    Discrete dynamic Bayesian network analysis of fMRI data

    HUMAN BRAIN MAPPING, Issue 1 2009
    John Burge
    Abstract We examine the efficacy of using discrete Dynamic Bayesian Networks (dDBNs), a data-driven modeling technique employed in machine learning, to identify functional correlations among neuroanatomical regions of interest. Unlike many neuroimaging analysis techniques, this method is not limited by linear and/or Gaussian noise assumptions. It achieves this by modeling the time series of neuroanatomical regions as discrete, as opposed to continuous, random variables with multinomial distributions. We demonstrated this method using an fMRI dataset collected from healthy and demented elderly subjects (Buckner, et al., 2000: J Cogn Neurosci 12:24-34) and identify correlates based on a diagnosis of dementia. The results are validated in three ways. First, the elicited correlates are shown to be robust over leave-one-out cross-validation and, via a Fourier bootstrapping method, that they were not likely due to random chance. Second, the dDBNs identified correlates that would be expected given the experimental paradigm. Third, the dDBN's ability to predict dementia is competitive with two commonly employed machine-learning classifiers: the support vector machine and the Gaussian naďve Bayesian network. We also verify that the dDBN selects correlates based on non-linear criteria. Finally, we provide a brief analysis of the correlates elicited from Buckner et al.'s data that suggests that demented elderly subjects have reduced involvement of entorhinal and occipital cortex and greater involvement of the parietal lobe and amygdala in brain activity compared with healthy elderly (as measured via functional correlations among BOLD measurements). Limitations and extensions to the dDBN method are discussed. Hum Brain Mapp, 2009. © 2007 Wiley-Liss, Inc. [source]


    The bootstrap and cross-validation in neuroimaging applications: Estimation of the distribution of extrema of random fields for single volume tests, with an application to ADC maps

    HUMAN BRAIN MAPPING, Issue 10 2007
    Roberto Viviani
    Abstract We discuss the assessment of signal change in single magnetic resonance images (MRI) based on quantifying significant departure from a reference distribution estimated from a large sample of normal subjects. The parametric approach is to build a test based on the expected distribution of extrema in random fields. However, in conditions where the variance is not uniform across the volume and the smoothness of the images is moderate to low, this test may be rather conservative. Furthermore, parametric tests are limited to datasets for which distributional assumptions hold. This paper investigates resampling methods that improve statistical tests for signal changes in single images in such adverse conditions, and that can be used for the assessment of images taken for clinical purposes. Two methods, the bootstrap and cross-validation, are compared. It is shown that the bootstrap may fail to provide a good estimate of the distribution of extrema of parametric maps. In contrast, calibration of the significance threshold by means of cross-validation (or related sampling without replacement techniques) address three issues at once: improved power, better voxel-by-voxel estimate of variance by local pooling, and adaptation to departures from ideal distributional assumptions on the signal. We apply the cross-validated tests to apparent diffusion coefficient maps, a type of MRI capable of detecting changes in the microstructural organization of brain parenchyma. We show that deviations from parametric assumptions are strong enough to cast doubt on the correctness of parametric tests for these images. As case studies, we present parametric maps of lesions in patients suffering from stroke and glioblastoma at different stages of evolution. Hum Brain Mapp 2007. © 2007 Wiley-Liss, Inc. [source]


    Geostatistical interpolation of space,time rainfall on Tamshui River basin, Taiwan

    HYDROLOGICAL PROCESSES, Issue 23 2007
    Shin-Jen Cheng
    Abstract Taiwan suffers from heavy storm rainfall during the typhoon season. This usually causes large river runoff, overland flow, erosion, landslides, debris flows, loss of power, etc. In order to evaluate storm impacts on the downstream basin, a real-time hydrological modelling is used to estimate potential hazard areas. This can be used as a decision-support system for the Emergency Response Center, National Fire Agency Ministry, to make ,real-time' responses and minimize possible damage to human life and property. This study used 34 observed events from 14 telemetered rain-gauges in the Tamshui River basin, Taiwan, to study the spatial,temporal characteristics of typhoon rainfall. In the study, regionalized theory and cross-semi-variograms were used to identify the spatial-temporal structure of typhoon rainfall. The power form and parameters of the cross-semi-variogram were derived through analysis of the observed data. In the end, cross-validation was used to evaluate the performance of the interpolated rainfall on the river basin. The results show the derived rainfall interpolator represents the observed events well, which indicates the rainfall interpolator can be used as a spatial-temporal rainfall input for real-time hydrological modelling. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Use of neural networks for the prediction of frictional drag and transmission of axial load in horizontal wellbores

    INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 2 2003
    Tanvir Sadiq
    Abstract The use of mud motors and other tools to accomplish forward motion of the bit in extended reach and horizontal wells allows avoiding large amounts of torque caused by rotation of the whole drill string. The forward motion of the drill string, however, is resisted by excessive amount of friction. In the presence of large compressive axial loads, the drill pipe or coiled tubing tends to buckle into a helix in horizontal boreholes. This causes additional frictional drag resisting the transmission of axial load (resulting from surface slack-off force) to the bit. As the magnitude of the frictional drag increases, a buckled pipe may become ,locked-up' making it almost impossible to drill further. In case of packers, the frictional drag may inhibit the transmission of set-up load to the packer. A prior knowledge of the magnitude of frictional drag for a given axial load and radial clearance can help avoid lock-up conditions and costly failure of the tubular. In this study a neural network model, for the prediction of frictional drag and axial load transmission in horizontal wellbores, is presented. Several neural network architectures were designed and tested to obtain the most accurate prediction. After cross-validation of the Back Propagation Neural Network (BPNN) algorithm, a two-hidden layer model was chosen for simultaneous prediction of frictional drag and axial load transmission. A comparison of results obtained from BPNN and General Regression Neural Network (GRNN) algorithms is also presented. Copyright © 2002 John Wiley & Sons, Ltd. [source]


    Biological indicators of prognosis in Ewing's sarcoma: An emerging role for lectin galactoside-binding soluble 3 binding protein (LGALS3BP)

    INTERNATIONAL JOURNAL OF CANCER, Issue 1 2010
    Diana Zambelli
    Abstract Starting from an experimental model that accounts for the 2 most important adverse processes to successful therapy of Ewing's sarcoma (EWS), chemoresistance and the presence of metastasis at the time of diagnosis, we defined a molecular signature of potential prognostic value. Functional annotation of differentially regulated genes revealed 3 major networks related to cell cycle, cell-to-cell interactions and cellular development. The prognostic impact of 8 genes, representative of these 3 networks, was validated in 56 EWS patients. High mRNA expression levels of HINT1, IFITM2, LGALS3BP, STOML2 and c-MYC were associated with reduced risk to death and lower risk to develop metastasis. At multivariate analysis, LGALS3BP, a matricellular protein with a role in tumor progression and metastasis, was the most important predictor of event-free survival and overall survival. The association between LGALS3BP and prognosis was confirmed at protein level, when expression of the molecule was determined in tumor tissues but not in serum, indicating a role for the protein at local tumor microenvironment. Engineered enhancement of LGALS3BP expression in EWS cells resulted in inhibition of anchorage independent cell growth and reduction of cell migration and metastasis. Silencing of LGALS3BP expression reverted cell behavior with respect to in vitro parameters, thus providing further functional validation of genetic data obtained in clinical samples. Thus, we propose LGALS3BP as a novel reliable indicator of prognosis, and we offer genetic signatures to the scientific communities for cross-validation and meta-analysis, which are indispensable tools for a rare tumor such as EWS. [source]


    Exploring the predictability of the ,Short Rains' at the coast of East Africa

    INTERNATIONAL JOURNAL OF CLIMATOLOGY, Issue 11 2004
    Stefan Hastenrath
    Abstract The boreal autumn ,Short Rains' at the coast of East Africa are deficient when there is weak development of a zonal circulation cell along the Indian Ocean equator, an anomalously low sea-surface temperature in the western portion of the basin, and in the high phase of the southern oscillation. Such large-scale circulation departures and their precursors are described by compact indices. September values of these indices for the period 1958,96 are used to explore the predictability of an index (RON) of October,November rainfall at the coast of East Africa. Regressions with cross-validation over the entire 1958,96 period are evaluated for the early (1958,77) and late (1978,96) halves of the record. In complementary experiments, the entire record is separated into 1958,77 as a training period and 1978,96 as a verification period. In all experiments, correlation of calculated versus observed rainfall is high for the early record and low for the late half of the record, a behaviour not noted in cross-validation over the entire 39 year time span. The 11-year sliding correlations of the indicated circulation indices with RON all reveal a drastic deterioration of relationships from the early to the late half of the record, although the equatorial zonal circulation cell appears to remain strong throughout. Copyright © 2004 Royal Meteorological Society [source]


    Mesoscale precipitation variability in the region of the European Alps during the 20th century

    INTERNATIONAL JOURNAL OF CLIMATOLOGY, Issue 9 2002
    Jürg Schmidli
    Abstract The purpose of this study is to construct and evaluate a new gridded analysis of precipitation that covers the entire region of the European Alps (43.2,48.8 ° N, 3.2,16.2 ° E), resolves the most prominent mesoscale variations (grid spacing 25 km) and extends with a monthly time-resolution over most of the 20th century (1901,90). The analysis is based on a reconstruction using the reduced-space optimal interpolation technique. It combines data from a high-resolution network over a restricted time period (1971,90) with homogeneous centennial records from a sparse sample of stations. The reconstructed fields account for 78% of the total variance in a cross-validation with independent data. The explained variance for individual grid points varies between 60 and 95%, with lower skills over the southern and western parts of the domain. For averages over 100 × 100 km2 subdomains, the explained variance increases to 90,99%. Comparison of the reconstruction with the CRU05 global analysis reveals good agreement with respect to the interannual variations of large subdomain averages (10 000,50 000 km2), some differences in decadal variations, especially for recent decades, and physically more plausible spatial patterns in the present analysis. The new dataset is exploited to depict 20th century precipitation variations and their correlations with the North Atlantic oscillation (NAO). A linear trend analysis (1901,90) reveals an increase of winter precipitation by 20,30% per 100 years in the western part of the Alps, and a decrease of autumn precipitation by 20,40% to the south of the main ridge. Correlations with the NAO index (NAOI) are weak and highly intermittent to the north and weak and more robust to the south of the main Alpine crest, indicating that changes in the NAOI in recent decades are not of primary importance in explaining observed precipitation changes. Copyright © 2002 Royal Meteorological Society [source]


    Finding Furfural Hydrogenation Catalysts via Predictive Modelling

    ADVANCED SYNTHESIS & CATALYSIS (PREVIOUSLY: JOURNAL FUER PRAKTISCHE CHEMIE), Issue 13 2010
    Zea Strassberger
    Abstract We combine multicomponent reactions, catalytic performance studies and predictive modelling to find transfer hydrogenation catalysts. An initial set of 18 ruthenium-carbene complexes were synthesized and screened in the transfer hydrogenation of furfural to furfurol with isopropyl alcohol complexes gave varied yields, from 62% up to >99.9%, with no obvious structure/activity correlations. Control experiments proved that the carbene ligand remains coordinated to the ruthenium centre throughout the reaction. Deuterium-labelling studies showed a secondary isotope effect (kH:kD=1.5). Further mechanistic studies showed that this transfer hydrogenation follows the so-called monohydride pathway. Using these data, we built a predictive model for 13 of the catalysts, based on 2D and 3D molecular descriptors. We tested and validated the model using the remaining five catalysts (cross-validation, R2=0.913). Then, with this model, the conversion and selectivity were predicted for four completely new ruthenium-carbene complexes. These four catalysts were then synthesized and tested. The results were within 3% of the model's predictions, demonstrating the validity and value of predictive modelling in catalyst optimization. [source]


    New computational algorithm for the prediction of protein folding types

    INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 1 2001
    Nikola, tambuk
    Abstract We present a new computational algorithm for the prediction of a secondary protein structure. The method enables the evaluation of ,- and ,-protein folding types from the nucleotide sequences. The procedure is based on the reflected Gray code algorithm of nucleotide,amino acid relationships, and represents the extension of Swanson's procedure in Ref. 4. It is shown that six-digit binary notation of each codon enables the prediction of ,- and ,-protein folds by means of the error-correcting linear block triple-check code. We tested the validity of the method on the test set of 140 proteins (70 ,- and 70 ,-folds). The test set consisted of standard ,- and ,-protein classes from Jpred and SCOP databases, with nucleotide sequence available in the GenBank database. 100% accurate classification of ,- and ,-protein folds, based on 39 dipeptide addresses derived by the error-correcting coding procedure was obtained by means of the logistic regression analysis (p<0.00000001). Classification tree and machine learning sequential minimal optimization (SMO) classifier confirmed the results by means 97.1% and 90% accurate classification, respectively. Protein fold prediction quality tested by means of leave-one-out cross-validation was a satisfactory 82.1% for the logistic regression and 81.4% for the SMO classifier. The presented procedure of computational analysis can be helpful in detecting the type of protein folding from the newly sequenced exon regions. The method enables quick, simple, and accurate prediction of ,- and ,-protein folds from the nucleotide sequence on a personal computer. © 2001 John Wiley & Sons, Inc. Int J Quant Chem 84: 13,22, 2001 [source]