Home About us Contact | |||
Regression Trees (regression + tree)
Selected AbstractsEvaluation of 6 Prognostic Models Used to Calculate Mortality Rates in Elderly Heart Failure Patients With a Fatal Heart Failure AdmissionCONGESTIVE HEART FAILURE, Issue 5 2010Andria L. Nutter The objective was to evaluate 6 commonly used heart failure (HF) prognostic models in an elderly, fatal HF population. Predictive models have been established to quantify risk among HF patients. The validation of these models has not been adequately studied, especially in an elderly cohort. Applying a single-center, retrospective study of serially admitted HF patients who died while in the hospital or within 30 days of discharge, the authors evaluated 6 prognostic models: the Seattle Heart Failure Model (SHFM), Heywood's model, Classification and Regression Tree (CART) Analysis, the Heart Failure Survival Score (HFSS), Heart Failure Risk Scoring System, and Pocock's score. Eighty patients were included (mean age, 82.7 ± 8.2 years). Twenty-three patients (28.75%) died in the hospital. The remainder died within 30 days of discharge. The models' predictions varied considerably from one another and underestimated the patients' actual mortality. This study demonstrates that these models underestimate the mortality risk in an elderly cohort at or approaching the end of life. Moreover, the predictions made by each model vary greatly from one another. Many of the models used were not intended for calculation during hospitalization. Development of improved models for the range of patients with HF syndromes is needed. Congest Heart Fail. 2010;16:196,201. © 2010 Wiley Periodicals, Inc. [source] An investigation of the hydrological requirements of River Red Gum (Eucalyptus camaldulensis) Forest, using Classification and Regression Tree modellingECOHYDROLOGY, Issue 2 2009Li Wen Abstract River Red Gum (Eucalyptus camaldulensis) is widely distributed throughout many water courses and floodplains within inland Australia. In recent years, accelerated decline of River Red Gum condition has been observed in many locations, and field observations of the degradation are consistent with the reduction of flooding. However, there are few publications that quantitatively investigate the relationships between River Red Gum condition and flooding history. We applied Classification and Regression Tree (CART) to model the minimum flooding requirement of River Red Gum forest/woodland in Yanga National Park, located on the Lower Murrumbidgee Floodplain, southeast Australia, using crown conditions derived from historical aerial photographs spanning more than 40 years. The model produced has a moderate reliability with an overall accuracy of 64·1% and a Kappa index of 0·543. The model brings in important insights about the relationship between River Red Gum community type, flood frequency and flood duration. Our results demonstrated that (1) CART analysis is a simple yet powerful technique with significant potential for application in river and environmental flow management; (2) River Red Gum communities on the Lower Murrumbidgee Floodplain require periodic inundation (3,5 years) for a duration of up to 64 days to be in moderate to good conditions; (3) Although the crown conditions of different community types displayed similar degradation trends, they have distinct flooding requirements; and (4) The River Red Gum community in Yanga National Park may be managed as hydrological units given limited environmental water allocations. Copyright © 2009 John Wiley & Sons, Ltd. [source] Genetic variants in cell cycle control pathway confer susceptibility to bladder cancerCANCER, Issue 11 2008Yuanqing Ye PhD Abstract BACKGROUND Cell cycle checkpoint regulation is crucial for the prevention of carcinogenesis in mammalian cells. METHODS To test the hypothesis that common sequence variants in the cell cycle control pathway may affect bladder cancer susceptibility, the effects of a panel of 10 potential functional single nucleotide polymorphisms (SNPs) from 7 cell cycle control genes, P53, P21, P27, CDK4, CDK6, CCND1, and STK15, were evaluated on bladder cancer risk in a case-control study of 696 bladder cancer cases and 629 healthy controls. RESULTS Overall, on individual SNP analysis only individuals with the p53 intron 3 16-bp duplication polymorphism variant allele had a significantly reduced bladder cancer risk (odds ratio [OR] = 0.74, 95% confidence interval [CI], 0.56,0.96). This effect was more evident in former smokers and younger subjects. We then applied the Classification and Regression Tree (CART) statistical approach to explore the high-order gene-gene and gene-smoking interactions. In the CART analysis, smoking status was identified as the most influential factor for bladder cancer susceptibility. The final decision tree by CART contained 6 terminal nodes. Compared with the second-lowest risk group the ORs for terminal nodes 1 and 3 to 6 ranged from 0.46 to 6.30. CONCLUSIONS These results suggest that cell cycle genetic polymorphisms may affect bladder cancer predisposition through modulation of host genome stability and confirm the importance of studying gene-gene and gene-environment interactions in bladder cancer risk assessment. Cancer 2008. © 2008 American Cancer Society. [source] Multivariate Survival Trees: A Maximum Likelihood Approach Based on Frailty ModelsBIOMETRICS, Issue 1 2004Xiaogang Su Summary. A method of constructing trees for correlated failure times is put forward. It adopts the backfitting idea of classification and regression trees (CART) (Breiman et al., 1984, in Classification and Regression Trees). The tree method is developed based on the maximized likelihoods associated with the gamma frailty model and standard likelihood-related techniques are incorporated. The proposed method is assessed through simulations conducted under a variety of model configurations and illustrated using the chronic granulomatous disease (CGD) study data. [source] Comparison of LiDAR waveform processing methods for very shallow water bathymetry using Raman, near-infrared and green signalsEARTH SURFACE PROCESSES AND LANDFORMS, Issue 6 2010Tristan Allouis Abstract Airborne light detection and ranging (LiDAR) bathymetry appears to be a useful technology for bed topography mapping of non-navigable areas, offering high data density and a high acquisition rate. However, few studies have focused on continental waters, in particular, on very shallow waters (<2,m) where it is difficult to extract the surface and bottom positions that are typically mixed in the green LiDAR signal. This paper proposes two new processing methods for depth extraction based on the use of different LiDAR signals [green, near-infrared (NIR), Raman] of the SHOALS-1000T sensor. They have been tested on a very shallow coastal area (Golfe du Morbihan, France) as an analogy to very shallow rivers. The first method is based on a combination of mathematical and heuristic methods using the green and the NIR LiDAR signals to cross validate the information delivered by each signal. The second method extracts water depths from the Raman signal using statistical methods such as principal components analysis (PCA) and classification and regression tree (CART) analysis. The obtained results are then compared to the reference depths, and the performances of the different methods, as well as their advantages/disadvantages are evaluated. The green/NIR method supplies 42% more points compared to the operator process, with an equivalent mean error (,4·2,cm verusu ,4·5,cm) and a smaller standard deviation (25·3,cm verusu 33·5,cm). The Raman processing method provides very scattered results (standard deviation of 40·3,cm) with the lowest mean error (,3·1,cm) and 40% more points. The minimum detectable depth is also improved by the two presented methods, being around 1,m for the green/NIR approach and 0·5,m for the statistical approach, compared to 1·5,m for the data processed by the operator. Despite its ability to measure other parameters like water temperature, the Raman method needed a large amount of reference data to provide reliable depth measurements, as opposed to the green/NIR method. Copyright © 2010 John Wiley & Sons, Ltd. [source] Association between color doppler vascularity index, angiogenesis-related molecules, and clinical outcomes in gastric cancerJOURNAL OF SURGICAL ONCOLOGY, Issue 7 2009Chiung-Nien Chen MD Abstract Purpose This study was conducted to evaluate the correlation between color Doppler vascularity index (CDVI), clinical outcomes and five angiogenesis-related molecules including vascular endothelial growth factor (VEGF), placenta growth factor (PlGF), cyclooxygenase-2 (COX-2), inducible nitric oxide synthase (iNOS), and calreticulin (CRT) in gastric cancer, and to develop an effective model selected from these five molecules to predict patient survival. Patients and Methods CDVI could be obtained preoperatively by transabdominal ultrasound from 30 patients. Enzyme immunoassay was adopted to determine protein level of VEGF and PlGF, and immunohistochemistry was used to detect COX-2, iNOS and CRT expression. Correlation between CDVI and five individual molecules was assessed. Multiple molecules model was developed using classification and regression tree (CART) analysis from five molecules, and was tested for patient survival in another 45 patients. Results CDVI was significantly correlated with patient survival (P,=,0.00907) and absolute number of metastatic lymph nodes (P,=,0.01). There was no significant association between CDVI and any individual molecule. The model, developed by CART consisting of VEGF and PlGF, could differentiate high and low CDVI and survival in testing group (P,=,0.00257). Conclusions CDVI was associated with lymph node metastasis, combined VEGF and PlGF expression status and patient survival in gastric cancer. J. Surg. Oncol. 2009;99:402,408. © 2009 Wiley-Liss, Inc. [source] WIDTH OF STREAMS AND RIVERS IN RESPONSE TO VEGETATION, BANK MATERIAL, AND OTHER FACTORS,JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION, Issue 5 2004Russell J. Anderson ABSTRACT: An extensive group of datasets was analyzed to examine factors affecting widths of streams and rivers. Results indicate that vegetative controls on channel size are scale dependent. In channels with watersheds greater than 10 to 100 km2, widths are narrower in channels with thick woody bank vegetation than in grass lined or nonforested banks. The converse is true in smaller streams apparently due to interactions between woody debris, shading, understory vegetation, rooting characteristics, and channel size. A tree based statistical method (regression tree) is introduced and tested as a tool for identifying thresholds of response and interpreting interactions between variables. The implications of scale dependent controls on channel width are discussed in the context of stable channel design methods and development of regional hydraulic geometry curves. [source] ACOUSTIC IDENTIFICATION OF NINE DELPHINID SPECIES IN THE EASTERN TROPICAL PACIFIC OCEANMARINE MAMMAL SCIENCE, Issue 1 2003Julie N. Oswald Abstract Acoustic methods may improve the ability to identify cetacean species during shipboard surveys. Whistles were recorded from nine odontocete species in the eastern tropical Pacific to determine how reliably these vocalizations can be classified to species based on simple spectrographic measurements. Twelve variables were measured from each whistle (n = 908). Parametric multivariate discriminant function analysis (DFA) correctly classified 41.1% of whistles to species. Non-parametric classification and regression tree (CART) analysis resulted in 51.4% correct classification. Striped dolphin whistles were most difficult to classify. Whistles of bottlenose dolphins, false killer whales, and pilot whales were most distinctive. Correct classification scores may be improved by adding prior probabilities that reflect species distribution to classification models, by measuring alternative whistle variables, using alternative classification techniques, and by localizing vocalizing dolphins when collecting data for classification models. [source] Soft Tissue Infections and Emergency Department Disposition: Predicting the Need for Inpatient AdmissionACADEMIC EMERGENCY MEDICINE, Issue 12 2009Alfredo Sabbaj MD Abstract Objectives:, Little empiric evidence exists to guide emergency department (ED) disposition of patients presenting with soft tissue infections. This study's objective was to generate a clinical decision rule to predict the need for greater than 24-hour hospital admission for patients presenting to the ED with soft tissue infection. Methods:, This was a retrospective cohort study of consecutive patients presenting to a tertiary care hospital ED with diagnosis of nonfacial soft tissue infection. Standardized chart review was used to collect 29 clinical variables. The primary outcome was >24-hour hospital admission (either general admission or ED observation unit), regardless of initial disposition. Patients initially discharged home and later admitted for more than 24 hours were included in the outcome. Data were analyzed using classification and regression tree (CART) analysis and multivariable logistic regression. Results:, A total of 846 patients presented to the ED with nonfacial soft tissue infection. After merging duplicate records, 674 patients remained, of which 81 (12%) required longer than 24-hour admission. Using CART, the strongest predictors of >24-hour admission were patient temperature at ED presentation and mechanism of infection. In the multivariable logistic regression model, initial patient temperature (odds ratio [OR] for each degree over 37°C = 2.91, 95% confidence interval [CI] = 1.65 to 5.12) and history of fever (OR = 3.02, 95% CI = 1.41 to 6.43) remained the strongest predictors of hospital admission. Despite these findings, there was no combination of factors that reliably identified more than 90% of target patients. Conclusions:, Although we were unable to generate a high-sensitivity decision rule to identify ED patients with soft tissue infection requiring >24-hour admission, the presence of a fever (either by initial ED vital signs or by history) was the strongest predictor of need for >24-hour hospital stay. These findings may help guide disposition of patients presenting to the ED with nonfacial soft tissue infections. [source] Protein profiling in pathology: Analysis and evaluation of 239 frozen tissue biopsies for diagnosis of B-cell lymphomasPROTEOMICS - CLINICAL APPLICATIONS, Issue 5 2010Corine Jansen Abstract Purpose: We determined the potential value of protein profiling of tissue samples by assessing how precise this approach enables discrimination of B-cell lymphoma from reactive lymph nodes, and how well the profiles can be used for lymphoma classification. Experimental design: Protein lysates from lymph nodes (n=239) from patients with the diagnosis of reactive hyperplasia (n=44), follicular lymphoma (n=63), diffuse large B-cell lymphoma (n=43), mantle cell lymphoma (n=47), and chronic lymphocytic leukemia/small lymphocytic B-cell lymphoma (n=42) were analysed by SELDI-TOF MS. Data analysis was performed by (i) classification and regression tree-based analysis and (ii) binary and polytomous logistic regression analysis. Results: After internal validation by the leave-one-out principle, both the classification and regression tree and logistic regression classification correctly identified the majority of the malignant (87 and 96%, respectively) and benign cases (73 and 75%, respectively). Classification was less successful since approximately one-third of the cases of each group were misclassified according to the histological classification. However, an additional mantle cell lymphoma case that was misclassified as chronic lymphocytic leukemia/small lymphocytic B-cell lymphoma initially was identified based on the protein profile. Conclusions and clinical relevance: SELDI-TOF MS protein profiling allows for reliable identification of the majority of malignant lymphoma cases; however, further validation and testing robustness in a diagnostic setting is needed. [source] Risk and outcome analysis of renal replacement therapies in patients after cardiac surgery with pre-operatively normal renal functionANAESTHESIA, Issue 6 2009D. Hauer Summary Peri-operative acute renal failure requiring renal replacement therapy is common (5,30%) after cardiac surgery and associated with a mortality of ,50%. Pre-operative renal impairment seems to be the most important risk factor for frank postoperative renal failure. To help evaluate the risk factors, we conducted a prospective observational trial of 1574 consecutive patients with normal pre-operative renal function (creatinine < 110 ,mol.l,1). Renal failure was defined as the need for renal replacement therapy. After univariate analysis of previously described risk factors, those who differed significantly between patients with or without renal failure were enrolled into a multivariate classification and regression tree (CART) statistical model that identifies the most ,predictive' risk factors and creates a ranked list of these. In patients with pre-operatively normal renal function, a serum level of lactate > 1.1 mmol.l,1 in the first 24 h after the operation was the strongest predictor for the development of renal failure. [source] Risk Stratification in Women Enrolled in the Acute Decompensated Heart Failure National Registry Emergency Module (ADHERE-EM)ACADEMIC EMERGENCY MEDICINE, Issue 2 2008Deborah B. Diercks MD Abstract Objectives:, It has been reported that the mortality risk for heart failure differs between men and women. It has been postulated that this is due to differences in comorbid features. Variation in risk profiles by gender may limit the performance of stratification algorithms available for heart failure in women. This analysis examined the ability of a published risk stratification model to predict outcomes in women. Methods:, The Acute Decompensated Heart Failure National Registry Emergency Module (ADHERE-EM) database was used. Characteristics, treatments, and outcomes for men and women were compared. The ADHERE registry classification and regression tree (CART) analysis was used for the risk stratification evaluation. Results:, Of 10,984 ADHERE-EM patients, 5,736 (52.2%) were women. In-hospital mortality was similar between men and women (p = 0.727). Significant differences (p < 0.0002) were noted by gender in all three variables in the CART model (blood urea nitrogen [BUN] , 43 mg/dL, systolic blood pressure < 115 mm Hg, and serum creatinine , 2.75 mg/dL). However, the CART model effectively stratified both genders into distinct risk groups with no significant difference in mortality by gender within stratified groups. Conclusions:, The ADHERE Registry CART tool is effective at predicting risk in ED patients, regardless of gender. [source] The role of procalcitonin in a decision tree for prediction of bloodstream infection in febrile patientsCLINICAL MICROBIOLOGY AND INFECTION, Issue 12 2006R. P. H. Peters Abstract Bloodstream infection (BSI) in febrile patients is associated with high mortality. Clinical and laboratory variables, such as procalcitonin (PCT), may predict BSI and help decision-making concerning empirical treatment. This study compared two models for prediction of BSI, and evaluated the role of PCT vs. clinical variables, collected daily in 300 consecutive febrile inpatients, for 48 h after onset of fever. Multiple logistic regression (MLR) and classification and regression tree (CART) models were compared for discriminatory power and diagnostic performance. BSI was present in 17% of cases. MLR identified the presence of intravascular devices, nadir albumin and thrombocyte counts, and peak temperature, respiratory rate and leukocyte counts, but not PCT, as independent predictors of BSI. In contrast, a peak PCT level of >2.45 ng/mL was the principal discriminator in the decision tree based on CART. The latter was more accurate (94%) than the model based on MLR (72%; p <0.01). Hence, the presence of BSI in febrile patients is predicted more accurately and by different variables, e.g., PCT, in CART analysis, as compared with MLR models. This underlines the value of PCT plus CART analysis in the diagnosis of a febrile patient. [source] Smaller and more numerous harvesting gaps emulate natural forest disturbances: a biodiversity test case using rove beetles (Coleoptera, Staphylinidae)DIVERSITY AND DISTRIBUTIONS, Issue 6 2008Jan Klimaszewski ABSTRACT Aim To evaluate changes in the abundance, species richness and community composition of rove beetles (Coleoptera, Staphylinidae) in response to three configurations of experimental gap cuts and to the effects of ground scarification in early succession yellow birch-dominated boreal forest. In each experimental treatment, total forest removed was held constant (35% removal by partial cutting with a concomitant decrease in gap size) but the total number of gaps was increased (two, four and eight gaps, respectively), resulting in an experimental increase in the total amount of ,edge' within each stand. Location Early succession yellow birch-dominated forests, Quebec, Canada. Methods Pitfall traps, ANOVA, MIXED procedure in sas®, post hoc Tukey's adjustment, rarefaction estimates, sum-of-squares and distance-based multivariate regression trees (ssMRT, dbMRT). Results Estimates of species richness using rarefaction were highest in clearcut and two-gap treatments, decreased in smaller and more numerous gaps and were significantly higher in scarified areas than in unscarified areas. ANOVA indicated a significant impact of harvesting on the overall standardized catch. Post hoc Tukey's tests indicated that the total catch of all rove beetles was significantly higher in uncut forests than in the treated areas. Both sum-of-squares and distance-based multivariate regression trees indicated that community structure of rove beetles differed among treatments. Assemblages were grouped into (a) control plots, (b) four- and eight-gap treatments and (c) two-gap and clearcut treatments. Main conclusions Rove beetle composition responded significantly to increasing gap size. Composition among intermediate and small-sized gap treatments (four- and eight-gap treatments) was more similar to uncut control forests than were larger gap treatments (two-gap) and clearcuts. Effects of scarification were nested within the harvested treatments. When the total area of forest removed is held constant, smaller, more numerous gaps are more similar to uncut control stands than to larger gaps and falls more closely within the natural forest heterogeneity. [source] Lake depth rather than fish planktivory determines cladoceran community structure in Faroese lakes , evidence from contemporary data and sedimentsFRESHWATER BIOLOGY, Issue 11 2006SUSANNE LILDAL AMSINCK Summary 1. This study describes the environmental conditions and cladoceran community structure of 29 Faroese lakes with special focus on elucidating the impact of fish planktivory. In addition, long-term changes in biological structure of the Faroese Lake Heygsvatn are investigated. 2. Present-day species richness and community structure of cladocerans were identified from pelagial snapshot samples and from samples of surface sediment (0,1 cm). Multivariate statistical methods were applied to explore cladoceran species distribution relative to measured environmental variables. For Lake Heygsvatn, lake development was inferred by cladoceran-based paleolimnological investigations of a 14C-dated sediment core covering the last ca 5700 years. 3. The 29 study lakes were overall shallow, small-sized, oligotrophic and dominated by brown trout (Salmo trutta). Cladoceran species richness was overall higher in the surface sediment samples than in the snapshot samples. 4. Fish abundance was found to be of only minor importance in shaping cladoceran community and body size structure, presumably because of predominance of the less efficient zooplanktivore brown trout. 5. Canonical correspondence analysis showed maximum lake depth (Zmax) to be the only significant variable in explaining the sedimentary cladoceran species (18 cladoceran taxa, two pelagic, 16 benthic) distribution. Multivariate regression trees revealed benthic taxa to dominate in lakes with Zmax < 4.8 m and pelagic taxa to dominate when Zmax was > 4.8 m. 6. Predictive models to infer Zmax were developed using variance weighted-averaging procedures. These were subsequently applied to subfossil cladoceran assemblages identified from a 14C-dated sediment core from Lake Heygsvatn and showed inferred Zmax to correspond well to the present-day lake depth. A recent increase in inferred Zmax may, however, be an artefact induced by, for instance, eutrophication. [source] Plant species richness in continental southern Siberia: effects of pH and climate in the context of the species pool hypothesisGLOBAL ECOLOGY, Issue 5 2007Milan Chytrý ABSTRACT Aim, Many high-latitude floras contain more calcicole than calcifuge vascular plant species. The species pool hypothesis explains this pattern through an historical abundance of high-pH soils in the Pleistocene and an associated opportunity for the evolutionary accumulation of calcicoles. To obtain insights into the history of calcicole/calcifuge patterns, we studied species richness,pH,climate relationships across a climatic gradient, which included cool and dry landscapes resembling the Pleistocene environments of northern Eurasia. Location, Western Sayan Mountains, southern Siberia. Methods, Vegetation and environmental variables were sampled at steppe, forest and tundra sites varying in climate and soil pH, which ranged from 3.7 to 8.6. Species richness was related to pH and other variables using linear models and regression trees. Results, Species richness is higher in areas with warmer winters and at medium altitudes that are warmer than the mountains and wetter than the lowlands. In treeless vegetation, the species richness,pH relationship is unimodal. In tundra vegetation, which occurs on low-pH soils, richness increases with pH, but it decreases in steppes, which have high-pH soils. In forests, where soils are more acidic than in the open landscape, the species richness,pH relationship is monotonic positive. Most species occur on soils with a pH of 6,7. Main conclusions, Soil pH in continental southern Siberia is strongly negatively correlated with precipitation, and species richness is determined by the opposite effects of these two variables. Species richness increases with pH until the soil is very dry. In dry soils, pH is high but species richness decreases due to drought stress. Thus, the species richness,pH relationship is unimodal in treeless vegetation. Trees do not grow on the driest soils, which results in a positive species richness,pH relationship in forests. If modern species richness resulted mainly from the species pool effects, it would suggest that historically common habitats had moderate precipitation and slightly acidic to neutral soils. [source] Origins and characteristics of Nearctic landbirds in Britain and Ireland in autumn: a statistical analysisIBIS, Issue 4 2006IAN A. MCLAREN We used data from eastern North America in regressions to explain autumn frequencies of Nearctic landbird species in Britain and Ireland (UK-IR). The data were: day-counts of 16 August,15 November from Nova Scotia (NS) on Sable Island 1963,2000 and Seal Island (1963,2002), combined in half-monthly intervals to account for seasonality; published seasonal totals (10- to 11-day intervals, 20 August,10 November 1955,80) of birds killed at a Florida (FL) TV tower; and published counts following a ,Fallout', 11 October 1998, of unseasonal species and southern vagrants in NS, believed to have originated as migrants in the southeast USA that followed a cold front offshore into strong southwest flow beyond. We also used the following species variables: body mass and wing length for size; sd of mass as a proxy for lipid capacity; a five-level index of migratory span (1 for within North America to 5 for almost totally to South America); latitude of easternmost breeding, and distance to nearest normal range to indicate status in NS; a two-level index for day vs. night migrants; an index, where pertinent, of significant population change (0 and 2 for a decrease and increase, respectively, 1 for no change). We also used classification and regression trees to cluster the potential transatlantic vagrants into homogeneous groups based on the explanatory variables. Standard generalized linear model regressions using counts from NS islands and FL produced highly positively skewed residuals (many species too common in UK-IR), but robust regressions eliminated statistical problems, and strengthened effects of non-count variables. Results using Fallout records, representing a subset of longer-distance night migrants, were statistically acceptable. The Fallout list, when supplied with counts from the same species from the NS islands and FL, produced highly significant (R2 = 0.79,0.93) and statistically acceptable regressions that were not improved by robust versions. Overall, the results indicate that October counts, especially of generally larger, longer-distance migrants, best represented those reaching UK-IR. The effect of geographical remoteness was negative , vagrants in NS were less likely to appear in UK-IR. Population changes were important in predicting the 1956,2003 UK-IR counts from 1955,80 FL counts. The seasonal characteristics, high explanatory power of the Fallout list and over-representation of probable over-ocean migrants in the standard regressions all support suggestions by others that many Nearctic vagrants in UK-IR originate in flights off southeast USA and are displaced downwind across the North Atlantic. [source] The influence of spatial errors in species occurrence data used in distribution modelsJOURNAL OF APPLIED ECOLOGY, Issue 1 2008Catherine H Graham Summary 1Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error. [source] Quick prediction of the retention of solutes in 13 thin layer chromatographic screening systems on silica gel by classification and regression treesJOURNAL OF SEPARATION SCIENCE, JSS, Issue 15 2008ukasz Komsta Abstract The use of classification and regression trees (CART) was studied in a quantitative structure,retention relationship (QSRR) context to predict the retention in 13 thin layer chromatographic screening systems on a silica gel, where large datasets of interlaboratory determined retention are available. The response (dependent variable) was the rate mobility (RM) factor, while a set of atomic contributions and functional substituent counts was used as an explanatory dataset. The trees were investigated against optimal complexity (number of the leaves) by external validation and internal crossvalidation. Their predictive performance is slightly lower than full atomic contribution model, but the main advantage is the simplicity. The retention prediction with the proposed trees can be done without computer or even pocket calculator. [source] Clustering work and family trajectories by using a divisive algorithmJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 4 2007Raffaella Piccarreta Summary., We present an approach to the construction of clusters of life course trajectories and use it to obtain ideal types of trajectories that can be interpreted and analysed meaningfully. We represent life courses as sequences on a monthly timescale and apply optimal matching analysis to compute dissimilarities between individuals. We introduce a new divisive clustering algorithm which has features that are in common with both Ward's agglomerative algorithm and classification and regression trees. We analyse British Household Panel Survey data on the employment and family trajectories of women. Our method produces clusters of sequences for which it is straightforward to determine who belongs to each cluster, making it easier to interpret the relative importance of life course factors in distinguishing subgroups of the population. Moreover our method gives guidance on selecting the number of clusters. [source] Plant species and growth form richness along altitudinal gradients in the southwest Ethiopian highlandsJOURNAL OF VEGETATION SCIENCE, Issue 4 2010Wana Desalegn Abstract Questions: Do growth forms and vascular plant richness follow similar patterns along an altitudinal gradient? What are the driving mechanisms that structure richness patterns at the landscape scale? Location: Southwest Ethiopian highlands. Methods: Floristic and environmental data were collected from 74 plots, each covering 400 m2. The plots were distributed along altitudinal gradients. Boosted regression trees were used to derive the patterns of richness distribution along altitudinal gradients. Results: Total vascular plant richness did not show any strong response to altitude. Contrasting patterns of richness were observed for several growth forms. Woody, graminoid and climber species richness showed a unimodal structure. However, each of these morphological groups had a peak of richness at different altitudes: graminoid species attained maximum importance at a lower elevations, followed by climbers and finally woody species at higher elevations. Fern species richness increased monotonically towards higher altitudes, but herbaceous richness had a dented structure at mid-altitudes. Soil sand fraction, silt, slope and organic matter were found to contribute a considerable amount of the predicted variance of richness for total vascular plants and growth forms. Main Conclusions: Hump-shaped species richness patterns were observed for several growth forms. A mid-altitudinal richness peak was the result of a combination of climate-related water,energy dynamics, species,area relationships and local environmental factors, which have direct effects on plant physiological performance. However, altitude represents the composite gradient of several environmental variables that were interrelated. Thus, considering multiple gradients would provide a better picture of richness and the potential mechanisms responsible for the distribution of biodiversity in high-mountain regions of the tropics. [source] Profiling MS proteomics data using smoothed non-linear energy operator and Bayesian additive regression treesPROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 17 2009Shan He Abstract This paper proposes a novel profiling method for SELDI-TOF and MALDI-TOF MS data that integrates a novel peak detection method based on modified smoothed non-linear energy operator, correlation-based peak selection and Bayesian additive regression trees. The peak detection and classification performance of the proposed approach is validated on two publicly available MS data sets, namely MALDI-TOF simulation data and high-resolution SELDI-TOF ovarian cancer data. The results compared favorably with three state-of-the-art peak detection algorithms and four machine-learning algorithms. For the high-resolution ovarian cancer data set, seven biomarkers (m/z windows) were found by our method, which achieved 97.30 and 99.10% accuracy at 25th and 75th percentiles, respectively, from 50 independent cross-validation samples, which is significantly better than other profiling and dimensional reduction methods. The results show that the method is capable of finding parsimonious sets of biologically meaningful biomarkers with better accuracy than existing methods. Supporting Information material and MATLAB/R scripts to implement the methods described in the article are available at: http://www.cs.bham.ac.uk/szh/SourceCode-for-Proteomics.zip [source] Evaluating the Ability of Tree-Based Methods and Logistic Regression for the Detection of SNP-SNP InteractionANNALS OF HUMAN GENETICS, Issue 3 2009M. García-Magariños Summary Most common human diseases are likely to have complex etiologies. Methods of analysis that allow for the phenomenon of epistasis are of growing interest in the genetic dissection of complex diseases. By allowing for epistatic interactions between potential disease loci, we may succeed in identifying genetic variants that might otherwise have remained undetected. Here we aimed to analyze the ability of logistic regression (LR) and two tree-based supervised learning methods, classification and regression trees (CART) and random forest (RF), to detect epistasis. Multifactor-dimensionality reduction (MDR) was also used for comparison. Our approach involves first the simulation of datasets of autosomal biallelic unphased and unlinked single nucleotide polymorphisms (SNPs), each containing a two-loci interaction (causal SNPs) and 98 ,noise' SNPs. We modelled interactions under different scenarios of sample size, missing data, minor allele frequencies (MAF) and several penetrance models: three involving both (indistinguishable) marginal effects and interaction, and two simulating pure interaction effects. In total, we have simulated 99 different scenarios. Although CART, RF, and LR yield similar results in terms of detection of true association, CART and RF perform better than LR with respect to classification error. MAF, penetrance model, and sample size are greater determining factors than percentage of missing data in the ability of the different techniques to detect true association. In pure interaction models, only RF detects association. In conclusion, tree-based methods and LR are important statistical tools for the detection of unknown interactions among true risk-associated SNPs with marginal effects and in the presence of a significant number of noise SNPs. In pure interaction models, RF performs reasonably well in the presence of large sample sizes and low percentages of missing data. However, when the study design is suboptimal (unfavourable to detect interaction in terms of e.g. sample size and MAF) there is a high chance of detecting false, spurious associations. [source] Quantitative computed tomography of the lumbar spine, not dual x-ray absorptiometry, is an independent predictor of prevalent vertebral fractures in postmenopausal women with osteopenia receiving long-term glucocorticoid and hormone-replacement therapyARTHRITIS & RHEUMATISM, Issue 5 2002Q. Rehman Objective To determine which measurement of bone mineral density (BMD) predicts vertebral fractures in a cohort of postmenopausal women with glucocorticoid-induced osteoporosis. Methods We recruited 114 subjects into the study. All had osteopenia of the lumbar spine or hip, as demonstrated by dual x-ray absorptiometry (DXA), and were receiving long-term glucocorticoids and hormone replacement therapy (HRT). Measurements of BMD by DXA of the lumbar spine, hip (and subregions), and forearm (and subregions), quantitative computed tomography (QCT) of the spine and hip (n = 59), and radiographs of the thoracolumbar spine were performed on all subjects to assess prevalent vertebral fractures. Vertebral fracture prevalence, as determined by morphometry, required a ,20% (or ,4-mm) loss of vertebral body height. Demographic information was obtained by questionnaire. Multiple regression and classification and regression trees (CART) analyses were used to assess predictors of vertebral fracture. Results Twenty-six percent of the study subjects had prevalent fractures. BMD of the lumbar spine, total hip and hip subregions, as measured by QCT, but only the lumbar spine and total hip, as measured by DXA, were significantly associated with prevalent vertebral fractures. However, only lumbar spine BMD as measured by QCT was a significant predictor of vertebral fractures. CART analysis showed that a BMD value <0.065 gm/cm3 was associated with a 7-fold higher risk of fracture than a BMD value ,0.065 gm/cm3. Conclusion In postmenopausal women with osteoporosis induced by long-term glucocorticoid treatment who are also receiving HRT, BMD of the lumbar spine as measured by QCT, but not DXA, is an independent predictor of vertebral fractures. [source] Spatial distribution and prediction of seed production by Eucalyptus microcarpa in a fragmented landscapeAUSTRAL ECOLOGY, Issue 1 2010PETER A. VESK Abstract Woodlands worldwide have been greatly modified by clearing for agriculture, and their conservation and restoration requires understanding of tree recruitment processes. Seed production is one possible point of recruitment failure, and one that the spatial arrangement of trees may affect. We sampled 118 Eucalyptus microcarpa (Myrtaceae) trees to compare and analyse the determinants of seed production in this dominant tree of modified, fragmented temperate grassy woodlands, which extend over much of southeastern Australia. Fecundity was estimated as the seed crop measured on leaf mass and whole tree bases and was compared between categories of tree configuration. We also modelled fecundity using boosted regression trees, a new and flexible tool. Fecundity on a leaf mass basis was predominantly influenced by environmental factors (topographic ,wetness', slope, soil type), rather than by local tree density and configuration. Fewer seed per unit leaf mass were produced on flat and topographically wet sites, reflecting poor tolerance of waterlogging by E. microcarpa. By contrast, whole tree fecundity was little influenced by environmental factors. Local tree density and configuration did influence whole tree fecundity, which was high in solitary and woodland-spaced trees and reduced under high local density. We found little evidence for reduced fecundity of E. microcarpa in solitary trees. This points to the importance of scattered trees as sources of seed for tree recruitment and for natural regeneration of landscape level tree cover. Considerable uncertainty remains in modelled seed supply, and may be reduced with sampling across multiple years and greater environmental and spatial domains. [source] Theory & Methods: Modified classification and regression tree splitting criteria for data with interactionsAUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2002Alexandra P. Bremner This paper proposes modified splitting criteria for classification and regression trees by modifying the definition of the deviance. The modified deviance is based on local averaging instead of global averaging and is more successful at modelling data with interactions. The paper shows that the modified criteria result in much simpler trees for pure interaction data (no main effects) and can produce trees with fewer errors and lower residual mean deviances than those produced by Clark & Pregibon's (1992) method when applied to real datasets with strong interaction effects. [source] Efron-Type Measures of Prediction Error for Survival AnalysisBIOMETRICS, Issue 4 2007Thomas A. Gerds Summary Estimates of the prediction error play an important role in the development of statistical methods and models, and in their applications. We adapt the resampling tools of Efron and Tibshirani (1997, Journal of the American Statistical Association92, 548,560) to survival analysis with right-censored event times. We find that flexible rules, like artificial neural nets, classification and regression trees, or regression splines can be assessed, and compared to less flexible rules in the same data where they are developed. The methods are illustrated with data from a breast cancer trial. [source] Multivariate Survival Trees: A Maximum Likelihood Approach Based on Frailty ModelsBIOMETRICS, Issue 1 2004Xiaogang Su Summary. A method of constructing trees for correlated failure times is put forward. It adopts the backfitting idea of classification and regression trees (CART) (Breiman et al., 1984, in Classification and Regression Trees). The tree method is developed based on the maximized likelihoods associated with the gamma frailty model and standard likelihood-related techniques are incorporated. The proposed method is assessed through simulations conducted under a variety of model configurations and illustrated using the chronic granulomatous disease (CGD) study data. [source] |