Dimension Reduction (dimension + reduction)

Distribution by Scientific Domains


Selected Abstracts


Dimension reduction for the conditional kth moment in regression

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 2 2002
Xiangrong Yin
The idea of dimension reduction without loss of information can be quite helpful for guiding the construction of summary plots in regression without requiring a prespecified model. Central subspaces are designed to capture all the information for the regression and to provide a population structure for dimension reduction. Here, we introduce the central kth-moment subspace to capture information from the mean, variance and so on up to the kth conditional moment of the regression. New methods are studied for estimating these subspaces. Connections with sliced inverse regression are established, and examples illustrating the theory are presented. [source]


Interactive Visualization of Function Fields by Range-Space Segmentation

COMPUTER GRAPHICS FORUM, Issue 3 2009
John C. Anderson
Abstract We present a dimension reduction and feature extraction method for the visualization and analysis of function field data. Function fields are a class of high-dimensional, multi-variate data in which data samples are one-dimensional scalar functions. Our approach focuses upon the creation of high-dimensional range-space segmentations, from which we can generate meaningful visualizations and extract separating surfaces between features. We demonstrate our approach on high-dimensional spectral imagery, and particulate pollution data from air quality simulations. [source]


A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotyped

GENETIC EPIDEMIOLOGY, Issue 1 2009
Tao Wang
Abstract Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense single nucleotype polymorphisms (SNPs) in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches, the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey's one-degree-of-freedom model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women's Health Initiative, this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with body mass index. Genet. Epidemiol. 2008. © 2008 Wiley-Liss, Inc. [source]


Classification of GC-MS measurements of wines by combining data dimension reduction and variable selection techniques

JOURNAL OF CHEMOMETRICS, Issue 8 2008
Davide Ballabio
Abstract Different classification methods (Partial Least Squares Discriminant Analysis, Extended Canonical Variates Analysis and Linear Discriminant Analysis), in combination with variable selection approaches (Forward Selection and Genetic Algorithms), were compared, evaluating their capabilities in the geographical discrimination of wine samples. Sixty-two samples were analysed by means of dynamic headspace gas chromatography mass spectrometry (HS-GC-MS) and the entire chromatographic profile was considered to build the dataset. Since variable selection techniques pose a risk of overfitting when a large number of variables is used, a method for coupling data dimension reduction and variable selection was proposed. This approach compresses windows of the original data by retaining only significant components of local Principal Component Analysis models. The subsequent variable selection is then performed on these locally derived score variables. The results confirmed that the classification models achieved on the reduced data were better than those obtained on the entire chromatographic profile, with the exception of Extended Canonical Variates Analysis, which gave acceptable models in both cases. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Partial least squares for discrimination

JOURNAL OF CHEMOMETRICS, Issue 3 2003
Matthew Barker
Abstract Partial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role. The interesting question is: why can a procedure that is principally designed for overdetermined regression problems locate and emphasize group structure? Using PLS in this manner has heurestic support owing to the relationship between PLS and canonical correlation analysis (CCA) and the relationship, in turn, between CCA and linear discriminant analysis (LDA). This paper replaces the heuristics with a formal statistical explanation. As a consequence, it will become clear that PLS is to be preferred over PCA when discrimination is the goal and dimension reduction is needed. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Sparse partial least squares regression for simultaneous dimension reduction and variable selection

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 1 2010
Hyonho Chun
Summary., Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. [source]


Hybrid Dirichlet mixture models for functional data

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2009
Sonia Petrone
Summary., In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. ,damaged' areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: non-homogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with k -levels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to , and their connection with Dirichlet process mixtures. [source]


Dimension reduction for the conditional kth moment in regression

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 2 2002
Xiangrong Yin
The idea of dimension reduction without loss of information can be quite helpful for guiding the construction of summary plots in regression without requiring a prespecified model. Central subspaces are designed to capture all the information for the regression and to provide a population structure for dimension reduction. Here, we introduce the central kth-moment subspace to capture information from the mean, variance and so on up to the kth conditional moment of the regression. New methods are studied for estimating these subspaces. Connections with sliced inverse regression are established, and examples illustrating the theory are presented. [source]


Eigen-frequencies in thin elastic 3-D domains and Reissner,Mindlin plate models

MATHEMATICAL METHODS IN THE APPLIED SCIENCES, Issue 1 2002
Monique Dauge
Abstract The eigen-frequencies of elastic three-dimensional thin plates are addressed and compared to the eigen-frequencies of two-dimensional Reissner,Mindlin plate models obtained by dimension reduction. The qualitative mathematical analysis is supported by quantitative numerical data obtained by the p-version finite element method. The mathematical analysis establishes an asymptotic expansion for the eigen-frequencies in power series of the thickness parameter. Such results are new for orthotropic materials and for the Reissner,Mindlin model. The 3-D and R,M asymptotics have a common first term but differ in their second terms. Numerical experiments for clamped plates show that for isotropic materials and relatively thin plates the Reissner,Mindlin eigen-frequencies provide a good approximation to the three-dimensional eigen-frequencies. However, for some anisotropic materials this is no longer the case, and relative errors of the order of 30 per cent are obtained even for relatively thin plates. Moreover, we showed that no shear correction factor is known to be optimal in the sense that it provides the best approximation of the R,M eigen-frequencies to their 3-D counterparts uniformly (for all relevant thicknesses range). Copyright © 2002 John Wiley & Sons, Ltd. [source]


An adaptive dimension reduction scheme for monitoring feedback-controlled processes

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 3 2009
Kaibo Wang
Abstract Detecting dynamic mean shifts is particularly important in monitoring feedback-controlled processes in which time-varying shifts are usually observed. When multivariate control charts are being utilized, one way to improve performance is to reduce dimensions. However, it is difficult to identify and remove non-informative variables statically in a process with dynamic shifts, as the contribution of each variable changes continuously over time. In this paper, we propose an adaptive dimension reduction scheme that aims to reduce dimensions of multivariate control charts through online variable evaluation and selection. The resulting chart is expected to keep only informative variables and hence maximize the sensitivity of control charts. Specifically, two sets of projection matrices are presented and dimension reduction is achieved via projecting process vectors into a low-dimensional space. Although developed based on feedback-controlled processes, the proposed scheme can be easily extended to monitor general multivariate applications. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Dynamic Process Modelling using a PCA-based Output Integrated Recurrent Neural Network

THE CANADIAN JOURNAL OF CHEMICAL ENGINEERING, Issue 4 2002
Yu Qian
Abstract A new methodology for modelling of dynamic process systems, the output integrated recurrent neural network (OIRNN), is presented in this paper. OIRNN can be regarded as a modified Jordan recurrent neural network, in which the past values for certain steps of the output variables are integrated with the input variables, and the original input variables are pre-processed using principal component analysis (PCA) for the purpose of dimension reduction. The main advantage of the PCA-based OIRNN is that the input dimension is reduced, so that the network can be used to model the dynamic behavior of multiple input multiple output (MIMO) systems effectively. The new method is illustrated with reference to the Tennessee-Eastman process and compared with principal component regression and feedforward neural networks. On présente dans cet article une nouvelle méthodologie pour la modélisation de systèmes de procédés dynamiques, soit le réseau neuronal récurrent avec intégration de la réponse (OIRNN). Ce dernier peut être vu comme un réseau neuronal récurrent de Jordan modifié, dans lequel les valeurs passées pour certaines étapes des valeurs de sortie sont intégrées aux variables d'entrée et les variables d'entrée originales pré-traitée par l'analyse des composants principaux (PCA) dans un but de réduction des dimensions. Le principal avantage de l'OIRNN basé sur la PCA est que la dimension d'entée est réduite de sorte que le réseau peut servir à modéliser le comportement dynamique de systèmes à entrée et sorties multiples (MIMO) de façon efficace. La nouvell méthod est illustrée dans le cas du procédé Tennessee-Eastman et est comparée aux réseaux neuronaux anticipés et à régression des composants principaux. [source]


DETECTING INFLUENTIAL OBSERVATIONS IN SLICED INVERSE REGRESSION ANALYSIS

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 3 2006
Luke A. Prendergast
Summary The detection of influential observations on the estimation of the dimension reduction subspace returned by Sliced Inverse Regression (SIR) is considered. Although there are many measures to detect influential observations in related methods such as multiple linear regression, there has been little development in this area with respect to dimension reduction. One particular influence measure for a version of SIR is examined and it is shown, via simulation and example, how this may be used to detect influential observations in practice. [source]