Correlated Variables (correlated + variable)

Distribution by Scientific Domains


Selected Abstracts


Statistical process monitoring based on dissimilarity of process data

AICHE JOURNAL, Issue 6 2002
Manabu Kano
Multivariate statistical process control (MSPC) has been widely used for monitoring chemical processes with highly correlated variables. In this work, a novel statistical process monitoring method is proposed based on the idea that a change of operating condition can be detected by monitoring a distribution of process data, which reflects the corresponding operating conditions. To quantitatively evaluate the difference between two data sets, a dissimilarity index is introduced. The monitoring performance of the proposed method, referred to as DISSIM, and that of the conventional MSPC method are compared with their applications to simulated data collected from a simple 2 × 2 process and the Tennessee Eastman process. The results clearly show that the monitoring performance of DISSIM, especially dynamic DISSIM, is considerably better than that of the conventional MSPC method when a time-window size is appropriately selected. [source]


Particle Markov chain Monte Carlo methods

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2010
Christophe Andrieu
Summary., Markov chain Monte Carlo and sequential Monte Carlo methods have emerged as the two main tools to sample from high dimensional probability distributions. Although asymptotic convergence of Markov chain Monte Carlo algorithms is ensured under weak assumptions, the performance of these algorithms is unreliable when the proposal distributions that are used to explore the space are poorly chosen and/or if highly correlated variables are updated independently. We show here how it is possible to build efficient high dimensional proposal distributions by using sequential Monte Carlo methods. This allows us not only to improve over standard Markov chain Monte Carlo schemes but also to make Bayesian inference feasible for a large class of statistical models where this was not previously so. We demonstrate these algorithms on a non-linear state space model and a Lévy-driven stochastic volatility model. [source]


Identification of individual tigers (Panthera tigris) from their pugmarks

JOURNAL OF ZOOLOGY, Issue 1 2005
Sandeep Sharma
Abstract An objective multivariate technique is described for identification of individual tigers Panthera tigris from their pugmarks. Tracings and photographs of hind pugmarks were obtained from 23 pugmark-sets of 19 individually known tigers (17 wild and two captive tigers). These 23 pugmark-sets were then divided into two groups, one of 15 pugmark-sets for model building and another of eight pugmark-sets for model testing and validation. A total of 93 measurements were taken from each pugmark along with three gait measurements. We used CV ratio, F -ratio and removed highly correlated variables to finally select 11 variables from these 93 variables. These 11 variables did not differ between left and right pugmarks. Stepwise discriminant function analysis (DFA) based on these 11 variables correctly classified pugmark-sets to individual tigers. A realistic population estimation exercise was simulated using the validation dataset. The algorithms developed here correctly allocated each pugmark-set to the correct individual tiger. The effect of extraneous factors, i.e. soil depth and multiple tracers, was also tested and pugmark tracings compared with pugmark photographs. We recommend collecting pugmarks from soil depths ranging between 0.5 and 1.0 cm, and advocate the use of pugmark photographs rather than pugmark tracings to eliminate the chance of obtaining substandard data from untrained tracers. Our study suggests that tigers can be individually identified from their pugmarks with a high level of accuracy and pugmark-sets could be used for population estimation of tigers within a statistically designed mark,recapture framework. [source]


A Combinatorial Approach to the Variable Selection in Multiple Linear Regression: Analysis of Selwood et al.

MOLECULAR INFORMATICS, Issue 6 2003
A Case Study, Data Set
Abstract A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632,0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time. [source]