Data Reduction (data + reduction)

Distribution by Scientific Domains


Selected Abstracts


Discovering Maximal Generalized Decision Rules Through Horizontal and Vertical Data Reduction

COMPUTATIONAL INTELLIGENCE, Issue 4 2001
Xiaohua Hu
We present a method to learn maximal generalized decision rules from databases by integrating discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated and the numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. Horizontal reduction is accomplished by merging identical tuples after the substitution of an attribute value by its higher level value in a pre-defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples in the database. In the second phase, a novel context-sensitive feature merit measure is used to rank the features, a subset of relevant attributes is chosen based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without destroying the interdependence relationships between classes and the attributes. Then rough set-based value reduction is further performed on the reduced table and all redundant condition values are dropped. Finally, tuples in the reduced table are transformed into a set of maximal generalized decision rules. The experimental results on UCI data sets and a real market database demonstrate that our method can dramatically reduce the feature space and improve learning accuracy. [source]


The application of knowledge discovery in databases to post-marketing drug safety: example of the WHO database

FUNDAMENTAL & CLINICAL PHARMACOLOGY, Issue 2 2008
A. Bate
Abstract After market launch, new information on adverse effects of medicinal products is almost exclusively first highlighted by spontaneous reporting. As data sets of spontaneous reports have become larger, and computational capability has increased, quantitative methods have been increasingly applied to such data sets. The screening of such data sets is an application of knowledge discovery in databases (KDD). Effective KDD is an iterative and interactive process made up of the following steps: developing an understanding of an application domain, creating a target data set, data cleaning and pre-processing, data reduction and projection, choosing the data mining task, choosing the data mining algorithm, data mining, interpretation of results and consolidating and using acquired knowledge. The process of KDD as it applies to the analysis of spontaneous reports can be exemplified by its routine use on the 3.5 million suspected adverse drug reaction (ADR) reports in the WHO ADR database. Examples of new adverse effects first highlighted by the KDD process on WHO data include topiramate glaucoma, infliximab vasculitis and the association of selective serotonin reuptake inhibitors (SSRIs) and neonatal convulsions. The KDD process has already improved our ability to highlight previously unsuspected ADRs for clinical review in spontaneous reporting, and we anticipate that such techniques will be increasingly used in the successful screening of other healthcare data sets such as patient records in the future. [source]


Multiple testing in the genomics era: Findings from Genetic Analysis Workshop 15, Group 15

GENETIC EPIDEMIOLOGY, Issue S1 2007
Lisa J. Martin
Abstract Recent advances in molecular technologies have resulted in the ability to screen hundreds of thousands of single nucleotide polymorphisms and tens of thousands of gene expression profiles. While these data have the potential to inform investigations into disease etiologies and advance medicine, the question of how to adequately control both type I and type II error rates remains. Genetic Analysis Workshop 15 datasets provided a unique opportunity for participants to evaluate multiple testing strategies applicable to microarray and single nucleotide polymorphism data. The Genetic Analysis Workshop 15 multiple testing and false discovery rate group (Group 15) investigated three general categories for multiple testing corrections, which are summarized in this review: statistical independence, error rate adjustment, and data reduction. We show that while each approach may have certain advantages, adequate error control is largely dependent upon the question under consideration and often requires the use of multiple analytic strategies. Genet. Epidemiol. 31(Suppl. 1):S124,S131, 2007. © 2007 Wiley-Liss, Inc. [source]


New multivariate test for linkage, with application to pleiotropy: Fuzzy Haseman-Elston

GENETIC EPIDEMIOLOGY, Issue 4 2003
Belhassen Kaabi
Abstract We propose a new method of linkage analysis based on using the grade of membership scores resulting from fuzzy clustering procedures to define new dependent variables for the various Haseman-Elston approaches. For a single continuous trait with low heritability, the aim was to identify subgroups such that the grade of membership scores to these subgroups would provide more information for linkage than the original trait. For a multivariate trait, the goal was to provide a means of data reduction and data mining. Simulation studies using continuous traits with relatively low heritability (H=0.1, 0.2, and 0.3) showed that the new approach does not enhance power for a single trait. However, for a multivariate continuous trait (with three components), it is more powerful than the principal component method and more powerful than the joint linkage test proposed by Mangin et al. ([1998] Biometrics 54:88,99) when there is pleiotropy. Genet Epidemiol 24:253,264, 2003. © 2003 Wiley-Liss, Inc. [source]


Source Zone Natural Attenuation at Petroleum Hydrocarbon Spill Sites,I: Site-Specific Assessment Approach

GROUND WATER MONITORING & REMEDIATION, Issue 4 2006
Paul Johnson
This work focuses on the site-specific assessment of source zone natural attenuation (SZNA) at petroleum spill sites, including the confirmation that SZNA is occurring, estimation of current SZNA rates, and anticipation of SZNA impact on future ground water quality. The approach anticipates that decision makers will be interested in answers to the following questions: (1) Is SZNA occurring and what processes are contributing to SZNA? (2) What are the current rates of mass removal associated with SZNA? (3) What are the longer-term implications of SZNA for ground water impacts? and (4) Are the SZNA processes and rates sustainable? This approach is a data-driven, macroscopic, multiple-lines-of-evidence approach and is therefore consistent with the 2000 National Research Council's recommendations and complementary to existing dissolved plume natural attenuation protocols and recent modeling work published by others. While this work is easily generalized, the discussion emphasizes SZNA assessment at petroleum hydrocarbon spill sites. The approach includes three basic levels of data collection and data reduction (Group I, Group II, and Group III). Group I measurements provide evidence that SZNA is occurring. Group II measurements include additional information necessary to estimate current SZNA rates, and group III measurements are focused on evaluating the long-term implications of SZNA for source zone characteristics and ground water quality. This paper presents the generalized site-specific SZNA assessment approach and then focuses on the interpretation of Group II data. Companion papers illustrate its application to source zones at a former oil field in California. [source]


Feature-space clustering for fMRI meta-analysis,

HUMAN BRAIN MAPPING, Issue 3 2001
Cyril Goutte
Abstract Clustering functional magnetic resonance imaging (fMRI) time series has emerged in recent years as a possible alternative to parametric modeling approaches. Most of the work so far has been concerned with clustering raw time series. In this contribution we investigate the applicability of a clustering method applied to features extracted from the data. This approach is extremely versatile and encompasses previously published results [Goutte et al., 1999] as special cases. A typical application is in data reduction: as the increase in temporal resolution of fMRI experiments routinely yields fMRI sequences containing several hundreds of images, it is sometimes necessary to invoke feature extraction to reduce the dimensionality of the data space. A second interesting application is in the meta-analysis of fMRI experiment, where features are obtained from a possibly large number of single-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular, shows interesting differences between individual voxel analysis performed with traditional methods. Hum. Brain Mapping 13:165,183, 2001. © 2001 Wiley-Liss, Inc. [source]


Applications of patient-specific CFD in medicine and life sciences

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, Issue 6-7 2003
Rainald Löhner
Abstract Recent advances in medical image segmentation, grid generation, flow solvers, realistic boundary conditions, fluid,structure interaction, data reduction and visualization arc reviewed with special emphasis on patient-specific flow prediction. At the same time, present shortcomings in each one of these areas are identified. Several examples are given that show that this methodology is maturing rapidly, and may soon find widespread use in medicine. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Ultra-small-angle X-ray scattering at the Advanced Photon Source

JOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 3 2009
Jan Ilavsky
The design and operation of a versatile ultra-small-angle X-ray scattering (USAXS) instrument at the Advanced Photon Source (APS) at Argonne National Laboratory are presented. The instrument is optimized for the high brilliance and low emittance of an APS undulator source. It has angular and energy resolutions of the order of 10,4, accurate and repeatable X-ray energy tunability over its operational energy range from 8 to 18,keV, and a dynamic intensity range of 108 to 109, depending on the configuration. It further offers quantitative primary calibration of X-ray scattering cross sections, a scattering vector range from 0.0001 to 1,Å,1, and stability and reliability over extended running periods. Its operational configurations include one-dimensional collimated (slit-smeared) USAXS, two-dimensional collimated USAXS and USAXS imaging. A robust data reduction and data analysis package, which was developed in parallel with the instrument, is available and supported at the APS. [source]


The SAXS/WAXS software system of the DUBBLE CRG beamline at the ESRF

JOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 4 2001
E. Homan
The small/wide-angle X-ray scattering (SAXS/WAXS) system on the DUBBLE CRG beamline at the ESRF is used for both static and time-resolved measurements. The integrated system developed for control and data reduction deals effectively with the high rates of incoming data from the different detector systems, as well as the presentation of results for the user. To ensure that the data may be used directly by a wide range of packages, they may be recorded in a number of output formats, thus serving as a practical test bed where developing standards may be compared and contrasted. The software system implements proposals raised at the canSAS meetings to promote a limited set of standard data formats for small-angle scattering studies. The system presented can cope with a volume of results in excess of 10,Gbytes of data per experiment and shows the advantages achieved by minimizing the dependence on raw-data formats. [source]


A deep i -selected multiwaveband galaxy catalogue in the COSMOS field,

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, Issue 4 2008
A. Gabasch
ABSTRACT In this paper we present a deep and homogeneous i -band-selected multiwaveband catalogue in the COSMOS field covering an area of about 0.7 deg2. Our catalogue with a formal 50 per cent completeness limit for point sources of i, 26.7 comprises about 290 000 galaxies with information in 8 passbands. We combine publicly available u, B, V, r, i, z and K data with proprietary imaging in H band. We discuss in detail the observations, the data reduction, and the photometric properties of the H -band data. We estimate photometric redshifts for all the galaxies in the catalogue. A comparison with 162 spectroscopic redshifts in the redshift range 0 ,z, 3 shows that the achieved accuracy of the photometric redshifts is ,z/(zspec+ 1) , 0.035 with only ,2 per cent outliers. We derive absolute UV magnitudes and investigate the evolution of the luminosity function evaluated in the rest-frame UV (1500 Å). There is a good agreement between the luminosity functions derived here and the luminosity functions derived in the FORS Deep Field. We see a similar brightening of M* and a decrease of ,* with redshift. The catalogue including the photometric redshift information is made publicly available. [source]


A Subaru/Suprime-Cam wide-field survey of globular cluster populations around M87 , I. Observation, data analysis and luminosity function

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, Issue 2 2006
Naoyuki Tamura
ABSTRACT In this paper and a companion paper, we report on a wide-field imaging survey of the globular cluster (GC) populations around M87 carried out with Suprime-Cam on the 8.2-m Subaru telescope. Here, we describe the observations, data reduction and data analysis, and present luminosity functions of GC populations around M87 and NGC 4552, another luminous Virgo elliptical in our survey field. The imaging data were taken in the B, V and I bands with a sky coverage of extending from the M87 centre out to ,0.5 Mpc. GC candidates were selected by applying a colour criterion on the B,V and V,I diagram to unresolved objects, which greatly reduces contamination. The data from control fields taken with Subaru/Suprime-Cam were also analysed for subtraction of contamination in the GC sample. These control field data are compatible with those in the M87 field in terms of the filter set (BVI), limiting magnitudes and image quality, which minimize the possibility of introducing any systematic errors into the subtractive correction. We investigate GC luminosity functions (GCLFs) at distances , 10 arcmin (,45 kpc) from the host galaxy centre in detail. By fitting Gaussians to the GCLFs, the V -band turnover magnitude (VTO) is estimated to be 23.62 ± 0.06 and 23.56 ± 0.20 mag for the GC population in M87 and NGC 4552, respectively. The GCLF is found to be a function of GC colour; VTO of the red GC subpopulation (V,I > 1.1) is fainter than that of the blue GC subpopulation (V,I, 1.1) in both M87 and NGC 4552, as expected if the colour differences are primarily due to a metallicity effect, and the mass functions of the two subpopulations are similar. The radial dependence of the GCLF is also investigated for the GC population in M87. The GCLF of each subpopulation at 1 ,R, 5 arcmin is compared to that at 5 ,R, 10 arcmin, but no significant trend with distance is found in the shape of the GCLF. We also estimate GC-specific frequencies (SN) for M87 and NGC 4552. The SN of the M87 GC population is estimated to be 12.5 ± 0.8 within 25 arcmin. The SN value of the NGC 4552 GC population is estimated to be 5.0 ± 0.6 within 10 arcmin. [source]


Correction of mass spectrometric isotope ratio measurements for isobaric isotopologues of O2, CO, CO2, N2O and SO2

RAPID COMMUNICATIONS IN MASS SPECTROMETRY, Issue 24 2008
Jan Kaiser
Gas isotope ratio mass spectrometers usually measure ion current ratios of molecules, not atoms. Often several isotopologues contribute to an ion current at a particular mass-to-charge ratio (m/z). Therefore, corrections have to be applied to derive the desired isotope ratios. These corrections are usually formulated in terms of isotope ratios (R), but this does not reflect the practice of measuring the ion current ratios of the sample relative to those of a reference material. Correspondingly, the relative ion current ratio differences (expressed as , values) are first converted into isotopologue ratios, then into isotope ratios and finally back into elemental , values. Here, we present a reformulation of this data reduction procedure entirely in terms of , values and the ,absolute' isotope ratios of the reference material. This also shows that not the absolute isotope ratios of the reference material themselves, but only product and ratio combinations of them, are required for the data reduction. These combinations can be and, for carbon and oxygen have been, measured by conventional isotope ratio mass spectrometers. The frequently implied use of absolute isotope ratios measured by specially calibrated instruments is actually unnecessary. Following related work on CO2, we here derive data reduction equations for the species O2, CO, N2O and SO2. We also suggest experiments to measure the required absolute ratio combinations for N2O, SO2 and O2. As a prelude, we summarise historic and recent measurements of absolute isotope ratios in international isotope reference materials. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Effects of data selection and error specification on the assimilation of AIRS data,

THE QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, Issue 622 2007
J. Joiner
Abstract The Atmospheric InfraRed Sounder (AIRS), flying aboard NASA's Aqua satellite with the Advanced Microwave Sounding Unit-A (AMSU-A) and four other instruments, has been providing data for use in numerical weather prediction and data assimilation systems for over three years. The full AIRS data set is currently not transmitted in near-real-time to the prediction/assimilation centres. Instead, data sets with reduced spatial and spectral information are produced and made available within three hours of the observation time. In this paper, we evaluate the use of different channel selections and error specifications. We achieve significant positive impact from the Aqua AIRS/AMSU-A combination during our experimental time period of January 2003. The best results are obtained using a set of 156 channels that do not include any in the H2O band between 1080 and 2100 cm,1. The H2O band channels have a large influence on both temperature and humidity analyses. If observation and background errors are not properly specified, the partitioning of temperature and humidity information from these channels will not be correct, and this can lead to a degradation in forecast skill. Therefore, we suggest that it is important to focus on background error specification in order to maximize the impact from AIRS and similar instruments. In addition, we find that changing the specified channel errors has a significant effect on the amount of data that enters the analysis as a result of quality control thresholds that are related to the errors. However, moderate changes to the channel errors do not significantly impact forecast skill with the 156 channel set. We also examine the effects of different types of spatial data reduction on assimilated data sets and NWP forecast skill. Whether we pick the centre or the warmest AIRS pixel in a 3 × 3 array affects the amount of data ingested by the analysis but does not have a statistically significant impact on the forecast skill. Copyright © Published in 2007 by John Wiley & Sons, Ltd. [source]


Quantifying instrument errors in macromolecular X-ray data sets

ACTA CRYSTALLOGRAPHICA SECTION D, Issue 6 2010
Kay Diederichs
An indicator which is calculated after the data reduction of a test data set may be used to estimate the (systematic) instrument error at a macromolecular X-ray source. The numerical value of the indicator is the highest signal-to-noise [I/,(I)] value that the experimental setup can produce and its reciprocal is related to the lower limit of the merging R factor. In the context of this study, the stability of the experimental setup is influenced and characterized by the properties of the X-ray beam, shutter, goniometer, cryostream and detector, and also by the exposure time and spindle speed. Typical values of the indicator are given for data sets from the JCSG archive. Some sources of error are explored with the help of test calculations using SIM_MX [Diederichs (2009), Acta Cryst. D65, 535,542]. One conclusion is that the accuracy of data at low resolution is usually limited by the experimental setup rather than by the crystal. It is also shown that the influence of vibrations and fluctuations may be mitigated by a reduction in spindle speed accompanied by stronger attenuation. [source]


HKL -3000: the integration of data reduction and structure solution , from diffraction images to an initial model in minutes

ACTA CRYSTALLOGRAPHICA SECTION D, Issue 8 2006
Marcin Cymborowski
A new approach that integrates data collection, data reduction, phasing and model building significantly accelerates the process of structure determination and on average minimizes the number of data sets and synchrotron time required for structure solution. Initial testing of the HKL -3000 system (the beta version was named HKL -2000_ph) with more than 140 novel structure determinations has proven its high value for MAD/SAD experiments. The heuristics for choosing the best computational strategy at different data resolution limits of phasing signal and crystal diffraction are being optimized. The typical end result is an interpretable electron-density map with a partially built structure and, in some cases, an almost complete refined model. The current development is oriented towards very fast structure solution in order to provide feedback during the diffraction experiment. Work is also proceeding towards improving the quality of phasing calculation and model building. [source]


New constraints from the H, line for the temperature of the transiting planet host star OGLE-TR-10,

ASTRONOMISCHE NACHRICHTEN, Issue 6 2008
M. Ammler-von Eiff
Abstract The spectroscopic analysis of systems with transiting planets gives strong constraints on planetary masses and radii as well as the chemical composition of the systems. The properties of the system OGLE-TR-10 are not well-constrained, partly due to the discrepancy of previous measurements of the effective temperature of the host star. This work, which is fully independent from previous works in terms of data reduction and analysis, uses the H, profile in order to get an additional constraint on the effective temperature. We take previously published UVES observations which have the highest available signal-to-noise ratio for OGLE-TR-10. A proper normalization to the relative continuum is done using intermediate data products of the reduction pipeline of the UVES spectrograph. The effective temperature then is determined by fitting synthetic H, profiles to the observed spectrum. With a result of Teff = 6020 ± 140 K, the H, profile clearly favours one of the previous measurements. The H, line is further consistent with dwarf-like surface gravities as well as solar and super-solar metallicities previously derived for OGLE-TR-10. The H, line could not be used to its full potential, partly because of the varying shape of the UVES échelle orders after flat field correction. We suggest to improve this feature when constructing future spectrographs. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


The WASP project in the era of robotic telescope networks

ASTRONOMISCHE NACHRICHTEN, Issue 8 2006
D. J. Christian
Abstract We present the current status of the WASP project, a pair of wide angle photometric telescopes, individually called Super-WASP. SuperWASP-I is located in La Palma, and SuperWASP-II at Sutherland in South Africa. SW-I began operations in April 2004. SW-II is expected to be operational in early 2006. Each SuperWASP instrument consists of up to 8 individual cameras using ultra-wide field lenses backed by high-quality passively cooled CCDs. Each camera covers 7.8 × 7.8 sq degrees of sky, for nearly 500 sq degrees of total sky coverage. One of the current aims of the WASP project is the search for extra-solar planet transits with a focus on brighter stars in the magnitude range ,8 to 13. Additionally, WASP will search for optical transients, track Near-Earth Objects, and study many types of variable stars and extragalactic objects. The collaboration has developed a custom-built reduction pipeline that achieves better than 1 percent photometric precision. We discuss future goals, which include: nightly on-mountain reductions that could be used to automatically drive alerts via a small robotic telescope network, and possible roles of the WASP telescopes as providers in such a network. Additional technical details of the telescopes, data reduction, and consortium members and institutions can be found on the web site at: http://www.superwasp.org/. (© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Automatic data reduction and archiving for STELLA

ASTRONOMISCHE NACHRICHTEN, Issue 6-8 2004
Article first published online: 13 OCT 200, M. Weber
Abstract The data is collected at the observatory, each data product is registered and a queue is set up to transfer the highest priority observations first. Optionally, lossy compression (e.g. White 1994) can be used to boost transfer speeds for time critical observations. Once the data has been transferred, the reduction process is started at the control-center in Potsdam. The type of reduction steps required can be specified by the user or a default pipeline setup can be used. The users can be notified about the status of their observations in any desired detail. (© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Observing the high redshift Universe using the VIMOS-IFU

ASTRONOMISCHE NACHRICHTEN, Issue 2 2004
S. Foucaud
Abstract We describe the advantages of using Integral Field Spectroscopy to observe deep fields of galaxies. The VIMOS Integral Field Unit is particularly suitable for this kind of study thanks to its large field-of-view (,1 arcmin2). After a short description of the VIMOS-IFU data reduction, we detail the main scientific issues which can be addressed using observations of the Hubble Deep Field South with a combination of Integral Field Spectroscopy and broad band optical and near-Infrared imaging. (© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Zero-dose extrapolation as part of macromolecular synchrotron data reduction

ACTA CRYSTALLOGRAPHICA SECTION D, Issue 5 2003
Kay Diederichs
Radiation damage to macromolecular crystals at third-generation synchrotron sites constitutes a major source of systematic error in X-ray data collection. Here, a computational method to partially correct the observed intensities during data reduction is described and investigated. The method consists of a redundancy-based zero-dose extrapolation of a decay function that is fitted to the intensities of all observations of a unique reflection as a function of dose. It is shown in a test case with weak anomalous signal that this conceptually simple correction, when applied to each unique reflection, can significantly improve the accuracy of averaged intensities and single-wavelength anomalous dispersion phases and leads to enhanced experimental electron-density maps. Limitations of and possible improvements to the method are discussed. [source]


Comparison between Principal Component Analysis and Independent Component Analysis in Electroencephalograms Modelling

BIOMETRICAL JOURNAL, Issue 2 2007
C. Bugli
Abstract Principal Component Analysis (PCA) is a classical technique in statistical data analysis, feature extraction and data reduction, aiming at explaining observed signals as a linear combination of orthogonal principal components. Independent Component Analysis (ICA) is a technique of array processing and data analysis, aiming at recovering unobserved signals or ,sources' from observed mixtures, exploiting only the assumption of mutual independence between the signals. The separation of the sources by ICA has great potential in applications such as the separation of sound signals (like voices mixed in simultaneous multiple records, for example), in telecommunication or in the treatment of medical signals. However, ICA is not yet often used by statisticians. In this paper, we shall present ICA in a statistical framework and compare this method with PCA for electroencephalograms (EEG) analysis. We shall see that ICA provides a more useful data representation than PCA, for instance, for the representation of a particular characteristic of the EEG named event-related potential (ERP). (© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]