Regression Splines (regression + spline)

Distribution by Scientific Domains

Kinds of Regression Splines

  • adaptive regression spline
  • multivariate adaptive regression spline


  • Selected Abstracts


    Flexible and Robust Implementations of Multivariate Adaptive Regression Splines Within a Wastewater Treatment Stochastic Dynamic Program

    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 7 2005
    Julia C. C. Tsai
    Abstract This paper presents an automatic and more robust implementation of multivariate adaptive regression splines (MARS) within the orthogonal array (OA)/MARS continuous-state stochastic dynamic programming (SDP) method. MARS is used to estimate the future value functions in each SDP level. The default stopping rule of MARS employs the maximum number of basis functions Mmax, specified by the user. To reduce the computational effort and improve the MARS fit for the wastewater treatment SDP model, two automatic stopping rules, which automatically determine an appropriate value for Mmax, and a robust version of MARS that prefers lower-order terms over higher-order terms are developed. Computational results demonstrate the success of these approaches. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    Bayesian Adaptive Regression Splines for Hierarchical Data

    BIOMETRICS, Issue 3 2007
    Jamie L. Bigelow
    Summary This article considers methodology for hierarchical functional data analysis, motivated by studies of reproductive hormone profiles in the menstrual cycle. Current methods standardize the cycle lengths and ignore the timing of ovulation within the cycle, both of which are biologically informative. Methods are needed that avoid standardization, while flexibly incorporating information on covariates and the timing of reference events, such as ovulation and onset of menses. In addition, it is necessary to account for within-woman dependency when data are collected for multiple cycles. We propose an approach based on a hierarchical generalization of Bayesian multivariate adaptive regression splines. Our formulation allows for an unknown set of basis functions characterizing the population-averaged and woman-specific trajectories in relation to covariates. A reversible jump Markov chain Monte Carlo algorithm is developed for posterior computation. Applying the methods to data from the North Carolina Early Pregnancy Study, we investigate differences in urinary progesterone profiles between conception and nonconception cycles. [source]


    Semiparametric Regression Splines in Matched Case-Control Studies

    BIOMETRICS, Issue 4 2003
    Inyoung Kim
    Summary. We develop semiparametric methods for matched case-control studies using regression splines. Three methods are developed: 1) an approximate cross-validation scheme to estimate the smoothing parameter inherent in regression splines, as well as 2) Monte Carlo expectation maximization (MCEM) and 3) Bayesian methods to fit the regression spline model. We compare the approximate cross-validation approach, MCEM, and Bayesian approaches using simulation, showing that they appear approximately equally efficient; the approximate cross-validation method is computationally the most convenient. An example from equine epidemiology that motivated the work is used to demonstrate our approaches. [source]


    A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms

    JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 1 2005
    Claudio J. Verzilli
    Summary., Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree. [source]


    Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines

    DIVERSITY AND DISTRIBUTIONS, Issue 3 2007
    Jane Elith
    ABSTRACT Current circumstances , that the majority of species distribution records exist as presence-only data (e.g. from museums and herbaria), and that there is an established need for predictions of species distributions , mean that scientists and conservation managers seek to develop robust methods for using these data. Such methods must, in particular, accommodate the difficulties caused by lack of reliable information about sites where species are absent. Here we test two approaches for overcoming these difficulties, analysing a range of data sets using the technique of multivariate adaptive regression splines (MARS). MARS is closely related to regression techniques such as generalized additive models (GAMs) that are commonly and successfully used in modelling species distributions, but has particular advantages in its analytical speed and the ease of transfer of analysis results to other computational environments such as a Geographic Information System. MARS also has the advantage that it can model multiple responses, meaning that it can combine information from a set of species to determine the dominant environmental drivers of variation in species composition. We use data from 226 species from six regions of the world, and demonstrate the use of MARS for distribution modelling using presence-only data. We test whether (1) the type of data used to represent absence or background and (2) the signal from multiple species affect predictive performance, by evaluating predictions at completely independent sites where genuine presence,absence data were recorded. Models developed with absences inferred from the total set of presence-only sites for a biological group, and using simultaneous analysis of multiple species to inform the choice of predictor variables, performed better than models in which species were analysed singly, or in which pseudo-absences were drawn randomly from the study area. The methods are fast, relatively simple to understand, and useful for situations where data are limited. A tutorial is included. [source]


    Nonparametric harmonic regression for estuarine water quality data

    ENVIRONMETRICS, Issue 6 2010
    Melanie A. Autin
    Abstract Periodicity is omnipresent in environmental time series data. For modeling estuarine water quality variables, harmonic regression analysis has long been the standard for dealing with periodicity. Generalized additive models (GAMs) allow more flexibility in the response function. They permit parametric, semiparametric, and nonparametric regression functions of the predictor variables. We compare harmonic regression, GAMs with cubic regression splines, and GAMs with cyclic regression splines in simulations and using water quality data collected from the National Estuarine Reasearch Reserve System (NERRS). While the classical harmonic regression model works well for clean, near-sinusoidal data, the GAMs are competitive and are very promising for more complex data. The generalized additive models are also more adaptive and require less-intervention. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Nonlinear multiple regression methods: a survey and extensions

    INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 1 2010
    Kenneth O. Cogger
    Abstract This paper reviews some nonlinear statistical procedures useful in function approximation, classification, regression and time-series analysis. Primary emphasis is on piecewise linear models such as multivariate adaptive regression splines, adaptive logic networks, hinging hyperplanes and their conceptual differences. Potential and actual applications of these methods are cited. Software for implementation is discussed, and practical suggestions are given for improvement. Examples show the relative capabilities of the various methods, including their ability for universal approximation. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Efficient estimation of three-dimensional curves and their derivatives by free-knot regression splines, applied to the analysis of inner carotid artery centrelines

    JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 3 2009
    Laura M. Sangalli
    Summary., We deal with the problem of efficiently estimating a three-dimensional curve and its derivatives, starting from a discrete and noisy observation of the curve. This problem is now arising in many applicative contexts, thanks to the advent of devices that provide three-dimensional images and measures, such as three-dimensional scanners in medical diagnostics. Our research, in particular, stems from the need for accurate estimation of the curvature of an artery, from image reconstructions of three-dimensional angiographies. This need has emerged within the AneuRisk project, a scientific endeavour which aims to investigate the role of vessel morphology, blood fluid dynamics and biomechanical properties of the vascular wall, on the pathogenesis of cerebral aneurysms. We develop a regression technique that exploits free-knot splines in a novel setting, to estimate three-dimensional curves and their derivatives. We thoroughly compare this technique with a classical regression method, local polynomial smoothing, showing that three-dimensional free-knot regression splines yield more accurate and efficient estimates. [source]


    Nonlinear modelling of periodic threshold autoregressions using Tsmars

    JOURNAL OF TIME SERIES ANALYSIS, Issue 4 2002
    PETER A. W. LEWIS
    We present new methods for modelling nonlinear threshold-type autoregressive behaviour in periodically correlated time series. The methods are illustrated using a series of average monthly flows of the Fraser River in British Columbia. Commonly used nonlinearity tests of the river flow data in each month indicate nonlinear behaviour in certain months. The periodic nonlinear correlation structure is modelled nonparametrically using TSMARS, a time series version of Friedman's extended multivariate adaptive regression splines (MARS) algorithm, which allows for categorical predictor variables. We discuss two methods of using the computational algorithm in TSMARS for modelling and fitting periodically correlated data. The first method applies the algorithm to data from each period separately. The second method models data from all periods simultaneously by incorporating an additional predictor variable to distinguish different behaviour in different periods, and allows for coalescing of data from periods with similar behaviour. The models obtained using TSMARS provide better short-term forecasts for the Fraser River data than a corresponding linear periodic AR model. [source]


    Flexible and Robust Implementations of Multivariate Adaptive Regression Splines Within a Wastewater Treatment Stochastic Dynamic Program

    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 7 2005
    Julia C. C. Tsai
    Abstract This paper presents an automatic and more robust implementation of multivariate adaptive regression splines (MARS) within the orthogonal array (OA)/MARS continuous-state stochastic dynamic programming (SDP) method. MARS is used to estimate the future value functions in each SDP level. The default stopping rule of MARS employs the maximum number of basis functions Mmax, specified by the user. To reduce the computational effort and improve the MARS fit for the wastewater treatment SDP model, two automatic stopping rules, which automatically determine an appropriate value for Mmax, and a robust version of MARS that prefers lower-order terms over higher-order terms are developed. Computational results demonstrate the success of these approaches. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    Efron-Type Measures of Prediction Error for Survival Analysis

    BIOMETRICS, Issue 4 2007
    Thomas A. Gerds
    Summary Estimates of the prediction error play an important role in the development of statistical methods and models, and in their applications. We adapt the resampling tools of Efron and Tibshirani (1997, Journal of the American Statistical Association92, 548,560) to survival analysis with right-censored event times. We find that flexible rules, like artificial neural nets, classification and regression trees, or regression splines can be assessed, and compared to less flexible rules in the same data where they are developed. The methods are illustrated with data from a breast cancer trial. [source]


    Bayesian Adaptive Regression Splines for Hierarchical Data

    BIOMETRICS, Issue 3 2007
    Jamie L. Bigelow
    Summary This article considers methodology for hierarchical functional data analysis, motivated by studies of reproductive hormone profiles in the menstrual cycle. Current methods standardize the cycle lengths and ignore the timing of ovulation within the cycle, both of which are biologically informative. Methods are needed that avoid standardization, while flexibly incorporating information on covariates and the timing of reference events, such as ovulation and onset of menses. In addition, it is necessary to account for within-woman dependency when data are collected for multiple cycles. We propose an approach based on a hierarchical generalization of Bayesian multivariate adaptive regression splines. Our formulation allows for an unknown set of basis functions characterizing the population-averaged and woman-specific trajectories in relation to covariates. A reversible jump Markov chain Monte Carlo algorithm is developed for posterior computation. Applying the methods to data from the North Carolina Early Pregnancy Study, we investigate differences in urinary progesterone profiles between conception and nonconception cycles. [source]


    Generalized Additive Modeling with Implicit Variable Selection by Likelihood-Based Boosting

    BIOMETRICS, Issue 4 2006
    Gerhard Tutz
    Summary The use of generalized additive models in statistical data analysis suffers from the restriction to few explanatory variables and the problems of selection of smoothing parameters. Generalized additive model boosting circumvents these problems by means of stagewise fitting of weak learners. A fitting procedure is derived which works for all simple exponential family distributions, including binomial, Poisson, and normal response variables. The procedure combines the selection of variables and the determination of the appropriate amount of smoothing. Penalized regression splines and the newly introduced penalized stumps are considered as weak learners. Estimates of standard deviations and stopping criteria, which are notorious problems in iterative procedures, are based on an approximate hat matrix. The method is shown to be a strong competitor to common procedures for the fitting of generalized additive models. In particular, in high-dimensional settings with many nuisance predictor variables it performs very well. [source]


    Semiparametric Regression Splines in Matched Case-Control Studies

    BIOMETRICS, Issue 4 2003
    Inyoung Kim
    Summary. We develop semiparametric methods for matched case-control studies using regression splines. Three methods are developed: 1) an approximate cross-validation scheme to estimate the smoothing parameter inherent in regression splines, as well as 2) Monte Carlo expectation maximization (MCEM) and 3) Bayesian methods to fit the regression spline model. We compare the approximate cross-validation approach, MCEM, and Bayesian approaches using simulation, showing that they appear approximately equally efficient; the approximate cross-validation method is computationally the most convenient. An example from equine epidemiology that motivated the work is used to demonstrate our approaches. [source]