Home About us Contact | |||
Data Consisting (data + consisting)
Selected AbstractsMultivariate calibration of hyperspectral ,-ray energy spectra for proximal soil sensingEUROPEAN JOURNAL OF SOIL SCIENCE, Issue 1 2007R. A. Viscarra Rossel Summary The development of proximal soil sensors to collect fine-scale soil information for environmental monitoring, modelling and precision agriculture is vital. Conventional soil sampling and laboratory analyses are time-consuming and expensive. In this paper we look at the possibility of calibrating hyperspectral ,-ray energy spectra to predict various surface and subsurface soil properties. The spectra were collected with a proximal, on-the-go ,-ray spectrometer. We surveyed two geographically and physiographically different fields in New South Wales, Australia, and collected hyperspectral ,-ray data consisting of 256 energy bands at more than 20 000 sites in each field. Bootstrap aggregation with partial least squares regression (or bagging-PLSR) was used to calibrate the ,-ray spectra of each field for predictions of selected soil properties. However, significant amounts of pre-processing were necessary to expose the correlations between the ,-ray spectra and the soil data. We first filtered the spectra spatially using local kriging, then further de-noised, normalized and detrended them. The resulting bagging-PLSR models of each field were tested using leave-one-out cross-validation. Bagging-PLSR provided robust predictions of clay, coarse sand and Fe contents in the 0,15 cm soil layer and pH and coarse sand contents in the 15,50 cm soil layer. Furthermore, bagging-PLSR provided us with a measure of the uncertainty of predictions. This study is apparently the first to use a multivariate calibration technique with on-the-go proximal ,-ray spectrometry. Proximally sensed ,-ray spectrometry proved to be a useful tool for predicting soil properties in different soil landscapes. [source] Metabolomics-based systematic prediction of yeast lifespan and its application for semi-rational screening of ageing-related mutantsAGING CELL, Issue 4 2010Ryo Yoshida Summary Metabolomics , the comprehensive analysis of metabolites , was recently used to classify yeast mutants with no overt phenotype using raw data as metabolic fingerprints or footprints. In this study, we demonstrate the estimation of a complicated phenotype, longevity, and semi-rational screening for relevant mutants using metabolic profiles as strain-specific fingerprints. The fingerprints used in our experiments are profiled data consisting of individually identified and quantified metabolites rather than raw spectrum data. We chose yeast replicative lifespan as a model phenotype. Several yeast mutants that affect lifespan were selected for analysis, and they were subjected to metabolic profiling using mass spectrometry. Fingerprinting based on the profiles revealed a correlation between lifespan and metabolic profile. Amino acids and nucleotide derivatives were the main contributors to this correlation. Furthermore, we established a multivariate model to predict lifespan from a metabolic profile. The model facilitated the identification of putative longevity mutants. This work represents a novel approach to evaluate and screen complicated and quantitative phenotype by means of metabolomics. [source] Biplots of compositional dataJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 4 2002John Aitchison Summary. The singular value decomposition and its interpretation as a linear biplot have proved to be a powerful tool for analysing many forms of multivariate data. Here we adapt biplot methodology to the specific case of compositional data consisting of positive vectors each of which is constrained to have unit sum. These relative variation biplots have properties relating to the special features of compositional data: the study of ratios, subcompositions and models of compositional relationships. The methodology is applied to a data set consisting of six-part colour compositions in 22 abstract paintings, showing how the singular value decomposition can achieve an accurate biplot of the colour ratios and how possible models interrelating the colours can be diagnosed. [source] Reliable prediction of T-cell epitopes using neural networks with novel sequence representationsPROTEIN SCIENCE, Issue 5 2003Morten Nielsen Abstract In this paper we describe an improved neural network method to predict T-cell class I epitopes. A novel input representation has been developed consisting of a combination of sparse encoding, Blosum encoding, and input derived from hidden Markov models. We demonstrate that the combination of several neural networks derived using different sequence-encoding schemes has a performance superior to neural networks derived using a single sequence-encoding scheme. The new method is shown to have a performance that is substantially higher than that of other methods. By use of mutual information calculations we show that peptides that bind to the HLA A*0204 complex display signal of higher order sequence correlations. Neural networks are ideally suited to integrate such higher order correlations when predicting the binding affinity. It is this feature combined with the use of several neural networks derived from different and novel sequence-encoding schemes and the ability of the neural network to be trained on data consisting of continuous binding affinities that gives the new method an improved performance. The difference in predictive performance between the neural network methods and that of the matrix-driven methods is found to be most significant for peptides that bind strongly to the HLA molecule, confirming that the signal of higher order sequence correlation is most strongly present in high-binding peptides. Finally, we use the method to predict T-cell epitopes for the genome of hepatitis C virus and discuss possible applications of the prediction method to guide the process of rational vaccine design. [source] Using Multinomial Mixture Models to Cluster Internet TrafficAUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2004Murray Jorgensen Summary The paper considers the clustering of two large sets of Internet traffic data consisting of information measured from headers of transmission control protocol packets collected on a busy arc of a university network connecting with the Internet. Packets are grouped into 'flows' thought to correspond to particular movements of information between one computer and another. The clustering is based on representing the flows as each sampled from one of a finite number of multinomial distributions and seeks to identify clusters of flows containing similar packet-length distributions. The clustering uses the EM algorithm, and the data-analytic and computational details are given. [source] |