Home About us Contact | |||
High-dimensional Data (high-dimensional + data)
Selected AbstractsShrinkage-based Diagonal Discriminant Analysis and Its Applications in High-Dimensional DataBIOMETRICS, Issue 4 2009Herbert Pang Summary High-dimensional data such as microarrays have brought us new statistical challenges. For example, using a large number of genes to classify samples based on a small number of microarrays remains a difficult problem. Diagonal discriminant analysis, support vector machines, and,k -nearest neighbor have been suggested as among the best methods for small sample size situations, but none was found to be superior to others. In this article, we propose an improved diagonal discriminant approach through shrinkage and regularization of the variances. The performance of our new approach along with the existing methods is studied through simulations and applications to real data. These studies show that the proposed shrinkage-based and regularization diagonal discriminant methods have lower misclassification rates than existing methods in many cases. [source] The geometry of ASCAJOURNAL OF CHEMOMETRICS, Issue 8 2008Age K. Smilde Abstract For analyzing designed high-dimensional data, no standard methods are currently available. A method that is becoming more and more popular for analyzing such data is ASCA. The mathematics of ASCA are already described elsewhere but a geometrical interpretation is still lacking. The geometry can help practitioners to understand what ASCA does and the more advanced user can get insight into the properties of the method. This paper shows the geometry of ASCA in both the row- and column-space of the matrices involved. Copyright © 2008 John Wiley & Sons, Ltd. [source] Robust methods for partial least squares regressionJOURNAL OF CHEMOMETRICS, Issue 10 2003M. Hubert Abstract Partial least squares regression (PLSR) is a linear regression technique developed to deal with high-dimensional regressors and one or several response variables. In this paper we introduce robustified versions of the SIMPLS algorithm, this being the leading PLSR algorithm because of its speed and efficiency. Because SIMPLS is based on the empirical cross-covariance matrix between the response variables and the regressors and on linear least squares regression, the results are affected by abnormal observations in the data set. Two robust methods, RSIMCD and RSIMPLS, are constructed from a robust covariance matrix for high-dimensional data and robust linear regression. We introduce robust RMSECV and RMSEP values for model calibration and model validation. Diagnostic plots are constructed to visualize and classify the outliers. Several simulation results and the analysis of real data sets show the effectiveness and robustness of the new approaches. Because RSIMPLS is roughly twice as fast as RSIMCD, it stands out as the overall best method. Copyright © 2003 John Wiley & Sons, Ltd. [source] A robust PCR method for high-dimensional regressorsJOURNAL OF CHEMOMETRICS, Issue 8-9 2003Mia Hubert Abstract We consider the multivariate calibration model which assumes that the concentrations of several constituents of a sample are linearly related to its spectrum. Principal component regression (PCR) is widely used for the estimation of the regression parameters in this model. In the classical approach it combines principal component analysis (PCA) on the regressors with least squares regression. However, both stages yield very unreliable results when the data set contains outlying observations. We present a robust PCR (RPCR) method which also consists of two parts. First we apply a robust PCA method for high-dimensional data on the regressors, then we regress the response variables on the scores using a robust regression method. A robust RMSECV value and a robust R2 value are proposed as exploratory tools to select the number of principal components. The prediction error is also estimated in a robust way. Moreover, we introduce several diagnostic plots which are helpful to visualize and classify the outliers. The robustness of RPCR is demonstrated through simulations and the analysis of a real data set. Copyright © 2003 John Wiley & Sons, Ltd. [source] High-Dimensional Cox Models: The Choice of Penalty as Part of the Model Building ProcessBIOMETRICAL JOURNAL, Issue 1 2010Axel Benner Abstract The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high-dimensional models where the number of covariates is much larger than the number of observations ( ) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L1 -penalized Cox regression using the lasso (Tibshirani (1997). Statistics in Medicine16, 385,395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li (2001). Journal of the American Statistical Association96, 1348,1360; Fan and Li (2002). The Annals of Statistics30, 74,99). The purpose of this article is to implement them practically into the model building process when analyzing high-dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou (2006). Journal of the American Statistical Association101, 1418,1429). We compare them with "standard" applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research. [source] L1 Penalized Estimation in the Cox Proportional Hazards ModelBIOMETRICAL JOURNAL, Issue 1 2010Jelle J. Goeman Abstract This article presents a novel algorithm that efficiently computes L1 penalized (lasso) estimates of parameters in high-dimensional models. The lasso has the property that it simultaneously performs variable selection and shrinkage, which makes it very useful for finding interpretable prediction rules in high-dimensional data. The new algorithm is based on a combination of gradient ascent optimization with the Newton,Raphson algorithm. It is described for a general likelihood function and can be applied in generalized linear models and other models with an L1 penalty. The algorithm is demonstrated in the Cox proportional hazards model, predicting survival of breast cancer patients using gene expression data, and its performance is compared with competing approaches. An R package, penalized, that implements the method, is available on CRAN. [source] |