Ridge Regression (ridge + regression)

Distribution by Scientific Domains


Selected Abstracts


Anomalies in the Foundations of Ridge Regression: Some Clarifications

INTERNATIONAL STATISTICAL REVIEW, Issue 2 2010
Prasenjit Kapat
Summary Several anomalies in the foundations of ridge regression from the perspective of constrained least-square (LS) problems were pointed out in Jensen & Ramirez. Some of these so-called anomalies, attributed to the non-monotonic behaviour of the norm of unconstrained ridge estimators and the consequent lack of sufficiency of Lagrange's principle, are shown to be incorrect. It is noted in this paper that, for a fixed,Y, norms of unconstrained ridge estimators corresponding to the given basis are indeed strictly monotone. Furthermore, the conditions for sufficiency of Lagrange's principle are valid for a suitable range of the constraint parameter. The discrepancy arose in the context of one data set due to confusion between estimates of the parameter vector,,,, corresponding to different parametrization (choice of bases) and/or constraint norms. In order to avoid such confusion, it is suggested that the parameter,,,corresponding to each basis be labelled appropriately. Résumé Plusieurs anomalies ont été récemment relevées par Jensen et Ramirez (2008) dans les fondements théoriques de la "ridge regression" considérée dans une perspective de moindres carrés constraints. Certaines de ces anomalies ont été attribuées au comportement non monotone de la norme des "ridge-estimateurs" non contraints, ainsi qu'au caractère non suffisant du principe de Lagrange. Nous indiquons dans cet article que, pour une valeur fixée de,Y, la norme des ridge-estimateurs correspondant à une base donnée sont strictement monotones. En outre, les conditions assurant le caractère suffisant du principe de Lagrange sont satisfaites pour un ensemble adéquat de valeurs du paramètre contraint. L'origine des anomalies relevées se trouve donc ailleurs. Cette apparente contradiction prend son origine, dans le contexte de l'étude d'un ensemble de données particulier, dans la confusion entre les estimateurs du vecteur de paramètres,,,correspondant à différentes paramétrisations (associées à différents choix d'une base) et/ou à différentes normes. Afin d'éviter ce type de confusion, il est suggéré d'indexer le paramètre de façon adéquate au moyen de la base choisie. [source]


Ridge regression in two-parameter solution

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 6 2005
Stan Lipovetsky
Abstract We consider simultaneous minimization of the model errors, deviations from orthogonality between regressors and errors, and deviations from other desired properties of the solution. This approach corresponds to a regularized objective that produces a consistent solution not prone to multicollinearity. We obtain a generalization of the ridge regression to two-parameter model that always outperforms a regular one-parameter ridge by better approximation, and has good properties of orthogonality between residuals and predicted values of the dependent variable. The results are very convenient for the analysis and interpretation of the regression. Numerical runs prove that this technique works very well. The examples are considered for marketing research problems. Copyright © 2005 John Wiley & Sons, Ltd. [source]


A covariance-adaptive approach for regularized inversion in linear models

GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 2 2007
Christopher Kotsakis
SUMMARY The optimal inversion of a linear model under the presence of additive random noise in the input data is a typical problem in many geodetic and geophysical applications. Various methods have been developed and applied for the solution of this problem, ranging from the classic principle of least-squares (LS) estimation to other more complex inversion techniques such as the Tikhonov,Philips regularization, truncated singular value decomposition, generalized ridge regression, numerical iterative methods (Landweber, conjugate gradient) and others. In this paper, a new type of optimal parameter estimator for the inversion of a linear model is presented. The proposed methodology is based on a linear transformation of the classic LS estimator and it satisfies two basic criteria. First, it provides a solution for the model parameters that is optimally fitted (in an average quadratic sense) to the classic LS parameter solution. Second, it complies with an external user-dependent constraint that specifies a priori the error covariance (CV) matrix of the estimated model parameters. The formulation of this constrained estimator offers a unified framework for the description of many regularization techniques that are systematically used in geodetic inverse problems, particularly for those methods that correspond to an eigenvalue filtering of the ill-conditioned normal matrix in the underlying linear model. Our study lies on the fact that it adds an alternative perspective on the statistical properties and the regularization mechanism of many inversion techniques commonly used in geodesy and geophysics, by interpreting them as a family of ,CV-adaptive' parameter estimators that obey a common optimal criterion and differ only on the pre-selected form of their error CV matrix under a fixed model design. [source]


Least-square support vector machine applied to settlement of shallow foundations on cohesionless soils

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 17 2008
Pijush Samui
Abstract This paper examines the potential of least-square support vector machine (LSVVM) in the prediction of settlement of shallow foundation on cohesionless soil. In LSSVM, Vapnik's ,-insensitive loss function has been replaced by a cost function that corresponds to a form of ridge regression. The LSSVM involves equality instead of inequality constraints and works with a least-squares cost function. The five input variables used for the LSSVM for the prediction of settlement are footing width (B), footing length (L), footing net applied pressure (P), average standard penetration test value (N) and footing embedment depth (d). Comparison between LSSVM and some of the traditional interpretation methods are also presented. LSSVM has been used to compute error bar. The results presented in this paper clearly highlight that the LSSVM is a robust tool for prediction of settlement of shallow foundation on cohesionless soil. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Anomalies in the Foundations of Ridge Regression: Some Clarifications

INTERNATIONAL STATISTICAL REVIEW, Issue 2 2010
Prasenjit Kapat
Summary Several anomalies in the foundations of ridge regression from the perspective of constrained least-square (LS) problems were pointed out in Jensen & Ramirez. Some of these so-called anomalies, attributed to the non-monotonic behaviour of the norm of unconstrained ridge estimators and the consequent lack of sufficiency of Lagrange's principle, are shown to be incorrect. It is noted in this paper that, for a fixed,Y, norms of unconstrained ridge estimators corresponding to the given basis are indeed strictly monotone. Furthermore, the conditions for sufficiency of Lagrange's principle are valid for a suitable range of the constraint parameter. The discrepancy arose in the context of one data set due to confusion between estimates of the parameter vector,,,, corresponding to different parametrization (choice of bases) and/or constraint norms. In order to avoid such confusion, it is suggested that the parameter,,,corresponding to each basis be labelled appropriately. Résumé Plusieurs anomalies ont été récemment relevées par Jensen et Ramirez (2008) dans les fondements théoriques de la "ridge regression" considérée dans une perspective de moindres carrés constraints. Certaines de ces anomalies ont été attribuées au comportement non monotone de la norme des "ridge-estimateurs" non contraints, ainsi qu'au caractère non suffisant du principe de Lagrange. Nous indiquons dans cet article que, pour une valeur fixée de,Y, la norme des ridge-estimateurs correspondant à une base donnée sont strictement monotones. En outre, les conditions assurant le caractère suffisant du principe de Lagrange sont satisfaites pour un ensemble adéquat de valeurs du paramètre contraint. L'origine des anomalies relevées se trouve donc ailleurs. Cette apparente contradiction prend son origine, dans le contexte de l'étude d'un ensemble de données particulier, dans la confusion entre les estimateurs du vecteur de paramètres,,,correspondant à différentes paramétrisations (associées à différents choix d'une base) et/ou à différentes normes. Afin d'éviter ce type de confusion, il est suggéré d'indexer le paramètre de façon adéquate au moyen de la base choisie. [source]


Regression by L1 regularization of smart contrasts and sums (ROSCAS) beats PLS and elastic net in latent variable model

JOURNAL OF CHEMOMETRICS, Issue 5 2009
Cajo J. F. ter Braak
Abstract This paper proposes a regression method, ROSCAS, which regularizes smart contrasts and sums of regression coefficients by an L1 penalty. The contrasts and sums are based on the sample correlation matrix of the predictors and are suggested by a latent variable regression model. The contrasts express the idea that a priori correlated predictors should have similar coefficients. The method has excellent predictive performance in situations, where there are groups of predictors with each group representing an independent feature that influences the response. In particular, when the groups differ in size, ROSCAS can outperform LASSO, elastic net, partial least squares (PLS) and ridge regression by a factor of two or three in terms of mean squared error. In other simulation setups and on real data, ROSCAS performs competitively. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Impartial graphical comparison of multivariate calibration methods and the harmony/parsimony tradeoff

JOURNAL OF CHEMOMETRICS, Issue 11-12 2006
Forrest Stout
Abstract For multivariate calibration with the relationship y,=,Xb, it is often necessary to determine the degrees of freedom for parsimony consideration and for the error measure root mean square error of calibration (RMSEC). This paper shows that degrees of freedom can be estimated by an effective rank (ER) measure to estimate the model fitting degrees of freedom and the more parsimonious model has the smallest ER. This paper also shows that when such a measure is used on the X-axis, simultaneous graphing of model errors and other regression diagnostics is possible for ridge regression (RR), partial least squares (PLS) and principal component regression (PCR) and thus, a fair comparison between all potential models can be accomplished. The ER approach is general and applicable to other multivariate calibration methods. It is often noted that by selecting variables, more parsimonious models are obtained; typically by multiple linear regression (MLR). By using the ER, the more parsimonious model is graphically shown to not always be the MLR model. Additionally, a harmony measure is proposed that expresses the bias/variance tradeoff for a particular model. By plotting this new measure against the ER, the proper harmony/parsimony tradeoff can be graphically assessed for RR, PCR and PLS. Essentially, pluralistic criteria for fairly valuating and characterizing models are better than a dualistic or a single criterion approach which is the usual tactic. Results are presented using spectral, industrial and quantitative structure activity relationship (QSAR) data. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Tikhonov regularization in standardized and general form for multivariate calibration with application towards removing unwanted spectral artifacts

JOURNAL OF CHEMOMETRICS, Issue 1-2 2006
Forrest Stout
Abstract Tikhonov regularization (TR) is an approach to form a multivariate calibration model for y,=,Xb. It includes a regulation operator matrix L that is usually set to the identity matrix I and in this situation, TR is said to operate in standard form and is the same as ridge regression (RR). Alternatively, TR can function in general form with L,,,I where L is used to remove unwanted spectral artifacts. To simplify the computations for TR in general form, a standardization process can be used on X and y to transform the problem into TR in standard form and a RR algorithm can now be used. The calculated regression vector in standardized space must be back-transformed to the general form which can now be applied to spectra that have not been standardized. The calibration model building methods of principal component regression (PCR), partial least squares (PLS) and others can also be implemented with the standardized X and y. Regardless of the calibration method, armed with y, X and L, a regression vector is sought that can correct for irrelevant spectral variation in predicting y. In this study, L is set to various derivative operators to obtain smoothed TR, PCR and PLS regression vectors in order to generate models robust to noise and/or temperature effects. Results of this smoothing process are examined for spectral data without excessive noise or other artifacts, spectral data with additional noise added and spectral data exhibiting temperature-induced peak shifts. When the noise level is small, derivative operator smoothing was found to slightly degrade the root mean square error of validation (RMSEV) as well as the prediction variance indicator represented by the regression vector 2-norm thereby deteriorating the model harmony (bias/variance tradeoff). The effective rank (ER) (parsimony) was found to decrease with smoothing and in doing so; a harmony/parsimony tradeoff is formed. For the temperature-affected data and some of the noisy data, derivative operator smoothing decreases the RMSEV, but at a cost of greater values for . The ER was found to increase and hence, the parsimony degraded. A simulated data set from a previous study that used TR in general form was reexamined. In the present study, the standardization process is used with L set to the spectral noise structure to eliminate undesirable spectral regions (wavelength selection) and TR, PCR and PLS are evaluated. There was a significant decrease in bias at a sacrifice to variance with wavelength selection and the parsimony essentially remains the same. This paper includes discussion on the utility of using TR to remove other undesired spectral patterns resulting from chemical, environmental and/or instrumental influences. The discussion also incorporates using TR as a method for calibration transfer. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Variable selection in random calibration of near-infrared instruments: ridge regression and partial least squares regression settings

JOURNAL OF CHEMOMETRICS, Issue 3 2003
Arief Gusnanto
Abstract Standard methods for calibration of near-infrared instruments, such as partial least-squares (PLS) and ridge regression (RR), typically use the full set of wavelengths in the model. In this paper we investigate the effect of variable (wavelength) selection for these two methods on the model prediction. For RR the selection is optimized with respect to the ridge parameter, the number of variables and the configuration of the variables in the model. A fast iterative computational algorithm is developed for the purpose of this optimization. For PLS the selection is optimized with respect to the number of components, the number of variables and the configuration of the variables. We use three real data sets in this study: processed milk from the market, milk from a dairy farm and milk from the production line of a milk processing factory. The quantity of interest is the concentration of fat in the milk. The observations are randomly split into estimation and validation sets. Optimization is based on the mean square prediction error computed on the validation set. The results indicate that the wavelength selection will not always give better prediction than using all of the available wavelengths. Investigation of the information in the spectra is necessary to determine whether all of them are relevant to the objective of the model. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Statistical analysis of catalyst degradation in a semi-continuous chemical production process

JOURNAL OF CHEMOMETRICS, Issue 8 2001
Eleftherios Kaskavelis
Abstract The effect of decaying catalyst efficacy in a commercial-scale, semi-continuous petrochemical process was investigated. The objective was to gain a better understanding of process behaviour and its effect on production rate. The process includes a three-stage reaction performed in fixed bed reactors. Each of the three reaction stages consists of a number of catalyst beds that are changed periodically to regenerate the catalyst. Product separation and reactant recycling are then performed in a series of distillation columns. In the absence of specific measurements of the catalyst properties, process operational data are used to assess catalyst decay. A number of statistical techniques were used to model production rate as a function of process operation, including information on short- and long-term catalyst decay. It was found that ridge regression, partial least squares and stepwise selection multiple linear regression yielded similar predictive models. No additional benefit was found from the application of non-linear partial least squares or Curds and Whey. Finally, through time series profiles of total daily production volume, corresponding to individual in-service cycles of the different reaction stages, short-term catalyst degradation was assessed. It was shown that by successively modelling the process as a sequence of batches corresponding to cycles of each reaction stage, considerable economic benefit could be realized by reducing the maximum cycle length in the third reaction stage. Copyright © 2001 John Wiley & Sons, Ltd. [source]


On the non-negative garrotte estimator

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 2 2007
Ming Yuan
Summary., We study the non-negative garrotte estimator from three different aspects: consistency, computation and flexibility. We argue that the non-negative garrotte is a general procedure that can be used in combination with estimators other than the original least squares estimator as in its original form. In particular, we consider using the lasso, the elastic net and ridge regression along with ordinary least squares as the initial estimate in the non-negative garrotte. We prove that the non-negative garrotte has the nice property that, with probability tending to 1, the solution path contains an estimate that correctly identifies the set of important variables and is consistent for the coefficients of the important variables, whereas such a property may not be valid for the initial estimators. In general, we show that the non-negative garrotte can turn a consistent estimate into an estimate that is not only consistent in terms of estimation but also in terms of variable selection. We also show that the non-negative garrotte has a piecewise linear solution path. Using this fact, we propose an efficient algorithm for computing the whole solution path for the non-negative garrotte. Simulations and a real example demonstrate that the non-negative garrotte is very effective in improving on the initial estimator in terms of variable selection and estimation accuracy. [source]


Ridge regression in two-parameter solution

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 6 2005
Stan Lipovetsky
Abstract We consider simultaneous minimization of the model errors, deviations from orthogonality between regressors and errors, and deviations from other desired properties of the solution. This approach corresponds to a regularized objective that produces a consistent solution not prone to multicollinearity. We obtain a generalization of the ridge regression to two-parameter model that always outperforms a regular one-parameter ridge by better approximation, and has good properties of orthogonality between residuals and predicted values of the dependent variable. The results are very convenient for the analysis and interpretation of the regression. Numerical runs prove that this technique works very well. The examples are considered for marketing research problems. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Dual- and triple-mode matrix approximation and regression modelling

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 4 2003
Stan Lipovetsky
Abstract We propose a dual- and triple-mode least squares for matrix approximation. This technique applied to the singular value decomposition produces the classical solution with a new interpretation. Applied to regression modelling, this approach corresponds to a regularized objective and yields a new solution with properties of a ridge regression. The results for regression are robust and suggest a convenient tool for the analysis and interpretation of the model coefficients. Numerical results are given for a marketing research data set. Copyright © 2003 John Wiley & Sons, Ltd. [source]


High-Dimensional Cox Models: The Choice of Penalty as Part of the Model Building Process

BIOMETRICAL JOURNAL, Issue 1 2010
Axel Benner
Abstract The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high-dimensional models where the number of covariates is much larger than the number of observations ( ) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L1 -penalized Cox regression using the lasso (Tibshirani (1997). Statistics in Medicine16, 385,395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li (2001). Journal of the American Statistical Association96, 1348,1360; Fan and Li (2002). The Annals of Statistics30, 74,99). The purpose of this article is to implement them practically into the model building process when analyzing high-dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou (2006). Journal of the American Statistical Association101, 1418,1429). We compare them with "standard" applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research. [source]


Bayesian Semiparametric Multiple Shrinkage

BIOMETRICS, Issue 2 2010
Richard F. MacLehose
Summary High-dimensional and highly correlated data leading to non- or weakly identified effects are commonplace. Maximum likelihood will typically fail in such situations and a variety of shrinkage methods have been proposed. Standard techniques, such as ridge regression or the lasso, shrink estimates toward zero, with some approaches allowing coefficients to be selected out of the model by achieving a value of zero. When substantive information is available, estimates can be shrunk to nonnull values; however, such information may not be available. We propose a Bayesian semiparametric approach that allows shrinkage to multiple locations. Coefficients are given a mixture of heavy-tailed double exponential priors, with location and scale parameters assigned Dirichlet process hyperpriors to allow groups of coefficients to be shrunk toward the same, possibly nonzero, mean. Our approach favors sparse, but flexible, structure by shrinking toward a small number of random locations. The methods are illustrated using a study of genetic polymorphisms and Parkinson's disease. [source]