Data Compression (data + compression)

Distribution by Scientific Domains

Selected Abstracts

Application of latent variable methods to process control and multivariate statistical process control in industry

Theodora Kourti
Abstract Multivariate monitoring and control schemes based on latent variable methods have been receiving increasing attention by industrial practitioners in the last 15 years. Several companies have enthusiastically adopted the methods and have reported many success stories. Applications have been reported where multivariate statistical process control, fault detection and diagnosis is achieved by utilizing the latent variable space, for continuous and batch processes, as well as, for process transitions as for example start ups and re-starts. This paper gives an overview of the latest developments in multivariate statistical process control (MSPC) and its application for fault detection and isolation (FDI) in industrial processes. It provides a critical review of the methodology and describes how it is transferred to the industrial environment. Recent applications of latent variable methods to process control as well as to image analysis for monitoring and feedback control are discussed. Finally it is emphasized that the multivariate nature of the data should be preserved when data compression and data preprocessing is applied. It is shown that univariate data compression and reconstruction may hinder the validity of multivariate analysis by introducing spurious correlations. Copyright © 2005 John Wiley & Sons, Ltd. [source]

Fuzzy data compression based on data dependencies

Z. M. Ma
In this article, we focus on the issues of fuzzy data dependencies. After introducing the notion of semantic equivalence degree, fuzzy functional and multivalued dependencies are defined. A set of sound and complete inference rules, similar to Armstrong's axioms for classic cases, for fuzzy functional dependencies (FFDs) and fuzzy multivalued dependencies (FMVDs) are proposed. The strategies and approaches for compressing fuzzy values by FFDs and FMVDs are investigated. By such processing, the unnecessary elements are eliminated from a fuzzy value and its range is compressed. © 2002 Wiley Periodicals, Inc. [source]

Accelerating the analyses of 3-way and 4-way PARAFAC models utilizing multi-dimensional wavelet compression

Jeff Cramer
Abstract Parallel factor analysis (PARAFAC) is one of the most popular methods for evaluating multi-way data sets, such as those typically acquired by hyphenated measurement techniques. One of the reasons for PARAFAC popularity is the ability to extract directly interpretable chemometric models with little a priori information and the capability to handle unknown interferents and missing values. However, PARAFAC requires long computation times that often prohibit sufficiently fast analyses for applications such as online sensing. An additional challenge faced by PARAFAC users is the handling and storage of very large, high-dimensional data sets. Accelerating computations and reducing storage requirements in multi-way analyses are the topics of this manuscript. This study introduces a data pre-processing method based on multi-dimensional wavelet transforms (WTs), which enables highly efficient data compression applied prior to data evaluation. Because multi-dimensional WTs are linear, the intrinsic underlying linear data construction is preserved in the wavelet domain. In almost all studied examples, computation times for analyzing the much smaller, compressed data sets could be reduced so much that the additional effort for wavelet compression was more than recompensated. For 3-way and 4-way synthetic and experimental data sets, acceleration factors up to 50 have been achieved; these data sets could be compressed down to a few per cent of the original size. Despite the high compression, accurate and interpretable models were derived, which are in good agreement with conventionally determined PARAFAC models. This study also found that the wavelet type used for compression is an important factor determining acceleration factors, data compression ratios and model quality. Copyright © 2006 John Wiley & Sons, Ltd. [source]

Fast principal component analysis of large data sets based on information extraction

F. Vogt
Abstract Principal component analysis (PCA) and principal component regression (PCR) are routinely used for calibration of measurement devices and for data evaluation. However, their use is hindered in some applications, e.g. hyperspectral imaging, by excessive data sets that imply unacceptable calculation time. This paper discusses a fast PCA achieved by a combination of data compression based on a wavelet transformation and a spectrum selection method prior to the PCA itself. The spectrum selection step can also be applied without previous data compression. The calculation speed increase is investigated based on original and compressed data sets, both simulated and measured. Two different data sets are used for assessment of the new approach. One set contains 65,536 synthetically generated spectra at four different noise levels with 256 measurement points each. Compared with the conventional PCA approach, these examples can be accelerated 20 times. Evaluation errors of the fast method were calculated and found to be comparable with those of the conventional approach. Four experimental spectra sets of similar size are also investigated. The novel method outperforms PCA in speed by factors of up to 12, depending on the data set. The principal components obtained by the novel algorithm show the same ability to model the measured spectra as the conventional time-consuming method. The acceleration factors also depend on the possible compression; in particular, if only a small compression is feasible, the acceleration lies purely with the novel spectrum selection step proposed in this paper. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Rapid Determination of Invert Cane Sugar Adulteration in Honey Using FTIR Spectroscopy and Multivariate Analysis

J. Irudayaraj
ABSTRACT: Fourier transform infrared spectroscopy with an attenuated total reflection sampling accessory was combined with multivariate analysis to determine the level (1% to 25%, wt/wt) of invert cane sugar adulteration in honey. On the basis of the spectral data compression by principal component analysis and partial least squares, linear discriminant analysis (LDA), and canonical variate analysis (CVA), models were developed and validated. Two types of artificial neural networks were applied: a quick back propagation network (BPN) and a radial basis function network (RBFN). The prediction success rates were better with LDA (93.75% for validation set) and BPN (93.75%) than with CVA (87.50%) and RBFN (81.25%). [source]

Near-field data compression for the far-field computation in FDTD

Romain Pascaud
Abstract This paper presents a technique to compress the near-field data required to compute the radiated fields using FDTD. This technique is applied to the study of a UWB planar diamond antenna. The results show a 99.8% gain in memory storage, while maintaining good accuracy: less than 1% error on the far-field radiation patterns. © 2006 Wiley Periodicals, Inc. Microwave Opt Technol Lett 48: 1155,1157, 2006; Published online in Wiley InterScience ( DOI 10.1002/mop.21553 [source]

High-dimensional data analysis: Selection of variables, data compression and graphics , Application to gene expression

Jürgen Läuter
Abstract The paper presents effective and mathematically exact procedures for selection of variables which are applicable in cases with a very high dimension as, for example, in gene expression analysis. Choosing sets of variables is an important method to increase the power of the statistical conclusions and to facilitate the biological interpretation. For the construction of sets, each single variable is considered as the centre of potential sets of variables. Testing for significance is carried out by means of the Westfall-Young principle based on resampling or by the parametric method of spherical tests. The particular requirements for statistical stability are taken into account; each kind of overfitting is avoided. Thus, high power is attained and the familywise type I error can be kept in spite of the large dimension. To obtain graphical representations by heat maps and curves, a specific data compression technique is applied. Gene expression data from B-cell lymphoma patients serve for the demonstration of the procedures. [source]