Data Space (data + space)

Distribution by Scientific Domains


Selected Abstracts


Wild mouse open field behavior is embedded within the multidimensional data space spanned by laboratory inbred strains

GENES, BRAIN AND BEHAVIOR, Issue 5 2006
E. Fonio
The vast majority of studies on mouse behavior are performed on laboratory mouse strains (Mus laboratorius), while studies of wild-mouse behavior are relatively rare. An interesting question is the relationship between the phenotypes of M. laboratorius and the phenotypes of their wild ancestors. It is commonly believed, often in the absence of hard evidence, that the behavior of wild mice exceeds by far, in terms of repertoire richness, magnitude of variables and variability of behavioral measures, the behavior of the classical inbred strains. Having phenotyped the open field behavior (OF) of eight of the commonly used laboratory inbred strains, two wild-derived strains and a group of first-generation-in-captivity local wild mice (Mus musculus domesticus), we show that contrary to common belief, wild-mouse OF behavior is moderate, both in terms of end-point values and in terms of their variability, being embedded within the multidimensional data space spanned by laboratory inbred strains. The implication could be that whereas natural selection favors moderate locomotor behavior in wild mice, the inbreeding process tends to generate in mice, in some of the features, extreme and more variable behavior. [source]


Feature-space clustering for fMRI meta-analysis,

HUMAN BRAIN MAPPING, Issue 3 2001
Cyril Goutte
Abstract Clustering functional magnetic resonance imaging (fMRI) time series has emerged in recent years as a possible alternative to parametric modeling approaches. Most of the work so far has been concerned with clustering raw time series. In this contribution we investigate the applicability of a clustering method applied to features extracted from the data. This approach is extremely versatile and encompasses previously published results [Goutte et al., 1999] as special cases. A typical application is in data reduction: as the increase in temporal resolution of fMRI experiments routinely yields fMRI sequences containing several hundreds of images, it is sometimes necessary to invoke feature extraction to reduce the dimensionality of the data space. A second interesting application is in the meta-analysis of fMRI experiment, where features are obtained from a possibly large number of single-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular, shows interesting differences between individual voxel analysis performed with traditional methods. Hum. Brain Mapping 13:165,183, 2001. © 2001 Wiley-Liss, Inc. [source]


Relationship between host diversity and parasite diversity: flea assemblages on small mammals

JOURNAL OF BIOGEOGRAPHY, Issue 11 2004
Boris R. Krasnov
Abstract Aim, We examined the relationship between host species richness and parasite species richness using simultaneously collected data on small mammals (Insectivora, Rodentia and Lagomorpha) and their flea parasites. Location, The study used previously published data on small mammals and their fleas from 37 different regions. All the world's main geographical regions other than Australasia and Wallacea were represented in the study, i.e. neotropical, nearctic, palaearctic, oriental and afrotropical realms. Methods, We controlled the data for the area sampled and sampling effort and then tested this relationship using both cross-region conventional analysis and the independent contrasts method (to control for the effects of biogeographic historical relationships among different regions). Brooks parsimony analysis was used to construct a region cladogram based on the presence/absence of a host species and host phylogeny. Results, Both cross-region and independent contrasts analyses showed a positive correlation between host species richness and flea species richness. Conventional cross-region regression under- or overestimated fleas species richness in the majority of regions. Main conclusions, When the regression derived by the independent contrasts method was mapped onto the original tip data space, points that deviated significantly from the regression originated from Kenya, Mississippi and southern California (lower than expected flea richness) and Chile, Idaho, south-western California and Kyrgyzstan (higher than expected flea richness). These deviations can be explained by the environmental mediation of host,flea relationships and by a degree of environmental variety in sampled areas. [source]


New and old trends in chemometrics.

JOURNAL OF CHEMOMETRICS, Issue 8-10 2002
How to deal with the increasing data volumes in R&D&P (research, development, process modeling, production), with examples from pharmaceutical research
Abstract Chemometrics was started around 30 years ago to cope with and utilize the rapidly increasing volumes of data produced in chemical laboratories. The methods of early chemometrics were mainly focused on the analysis of data, but slowly we came to realize that it is equally important to make the data contain reliable information, and methods for design of experiments (DOE) were added to the chemometrics toolbox. This toolbox is now fairly adequate for solving most R&D problems of today in both academia and industry, as will be illustrated with a few examples. However, with the further increase in the size of our data sets, we start to see inadequacies in our multivariate methods, both in their efficiency and interpretability. Drift and non-linearities occur with time or in other directions in data space, and models with masses of coefficients become increasingly difficult to interpret and use. Starting from a few examples of some very complicated problems confronting chemical researchers today, possible extensions and generalizations of the existing chemometrics methods, as well as more appropriate preprocessing of the data before the analysis, will be discussed. Criteria such as scalability of methods to increasing size of problems and data, increasing sophistication in the handling of noise and non-linearities, interpretability of results, and relative simplicity of use will be held as important. The discussion will be made from a perspective of the evolution of the scientific methodology as driven by new technology, e.g. computers, and constrained by the limitations of the human brain, i.e. our ability to understand and interpret scientific and data analytical results. Quilt-PCA and Quilt-PLS presented here address and offer a possible solution to these problems. Copyright © 2002 John Wiley & Sons, Ltd. [source]


NEURAL NETWORK PREDICTION OF PERMEABILITY IN THE EL GARIA FORMATION, ASHTART OILFIELD, OFFSHORE TUNISIA

JOURNAL OF PETROLEUM GEOLOGY, Issue 4 2001
J.H. Ligtenberg
The Lower Eocene El Garia Formation forms the reservoir rock at the Ashtart oilfield, offshore Tunisia. It comprises a thick package of mainly nummulitic packstones and grainstones with variable reservoir quality. Although porosity is moderate to high, permeability is often poor to fair with some high permeability streaks. The aim of this study was to establish relationships between log-derived data and core data, and to apply these relationships in a predictive sense to uncored intervals. An initial objective was to predict from measured logs and core data the limestone depositional texture (as indicated by the Dunham classification), as well as porosity and permeability. A total of nine wells with complete logging suites, multiple cored intervals with core plug measurements together with detailed core interpretations were available. We used a fully-connected Multi-Layer-Perceptron network (a type of neural network) to establish possible non-linear relationships. Detailed analyses revealed that no relationship exists between log response and limestone texture (Dunham class). The initial idea to predict Dunham class, and subsequently to use the classification results to predict permeability, could not therefore be pursued. However, further analyses revealed that it was feasible to predict permeability without using the depositional fabric, but using a combination of wireline logs and measured core porosity. Careful preparation of the training set for the neural network proved to be very important. Early experiments showed that low to fair permeability (1,35 mD) could be predicted with confidence, but that the network failed to predict the high permeability streaks. "Balancing " the data set solved this problem. Balancing is a technique in which the training set is increased by adding more examples to the under-sampled part of the data space. Examples are created by random selection from the training set and white noise is added. After balancing, the neural network's performance improved significantly. Testing the neural network on two wells indicated that this method is capable of predicting the entire range of permeability with confidence. [source]


MapQuant: Open-source software for large-scale protein quantification

PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 6 2006
Kyriacos C. Leptos
Abstract Whole-cell protein quantification using MS has proven to be a challenging task. Detection efficiency varies significantly from peptide to peptide, molecular identities are not evident a,priori, and peptides are dispersed unevenly throughout the multidimensional data space. To overcome these challenges we developed an open-source software package, MapQuant, to quantify comprehensively organic species detected in large MS datasets. MapQuant treats an LC/MS experiment as an image and utilizes standard image processing techniques to perform noise filtering, watershed segmentation, peak finding, peak fitting, peak clustering, charge-state determination and carbon-content estimation. MapQuant reports abundance values that respond linearly with the amount of sample analyzed on both low- and high-resolution instruments (over a 1000-fold dynamic range). Background noise added to a sample, either as a medium-complexity peptide mixture or as a high-complexity trypsinized proteome, exerts negligible effects on the abundance values reported by MapQuant and with coefficients of variance comparable to other methods. Finally, MapQuant's ability to define accurate mass and retention time features of isotopic clusters on a high-resolution mass spectrometer can increase protein sequence coverage by assigning sequence identities to observed isotopic clusters without corresponding MS/MS data. [source]