Mining Techniques (mining + techniques)

Distribution by Scientific Domains

Kinds of Mining Techniques

  • data mining techniques


  • Selected Abstracts


    BUILDING A DATA-MINING GRID FOR MULTIPLE HUMAN BRAIN DATA ANALYSIS

    COMPUTATIONAL INTELLIGENCE, Issue 2 2005
    Ning Zhong
    E-science is about global collaboration in key areas of science such as cognitive science and brain science, and the next generation of infrastructure such as the Wisdom Web and Knowledge Grids. As a case study, we investigate human multiperception mechanism by cooperatively using various psychological experiments, physiological measurements, and data mining techniques for developing artificial systems which match human ability in specific aspects. In particular, we observe fMRI (functional magnetic resonance imaging) and EEG (electroencephalogram) brain activations from the viewpoint of peculiarity oriented mining and propose a way of peculiarity oriented mining for knowledge discovery in multiple human brain data. Based on such experience and needs, we concentrate on the architectural aspect of a brain-informatics portal from the perspective of the Wisdom Web and Knowledge Grids. We describe how to build a data-mining grid on the Wisdom Web for multiaspect human brain data analysis. The proposed methodology attempts to change the perspective of cognitive scientists from a single type of experimental data analysis toward a holistic view at a long-term, global field of vision. [source]


    A data mining approach to financial time series modelling and forecasting

    INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 4 2001
    Zoran Vojinovic
    This paper describes one of the relatively new data mining techniques that can be used to forecast the foreign exchange time series process. The research aims to contribute to the development and application of such techniques by exposing them to difficult real-world (non-toy) data sets. The results reveal that the prediction of a Radial Basis Function Neural Network model for forecasting the daily $US/$NZ closing exchange rates is significantly better than the prediction of a traditional linear autoregressive model in both directional change and prediction of the exchange rate itself. We have also investigated the impact of the number of model inputs (model order), the number of hidden layer neurons and the size of training data set on prediction accuracy. In addition, we have explored how the three different methods for placement of Gaussian radial basis functions affect its predictive quality and singled out the best one. Copyright © 2001 John Wiley & Sons, Ltd. [source]


    A data warehouse/online analytic processing framework for web usage mining and business intelligence reporting

    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 7 2004
    Xiaohua Hu
    Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information, etc.) in order to understand and serve e-commerce customers better and improve the online business. In this article, we present a general data warehouse/online analytic processing (OLAP) framework for web usage mining and business intelligence reporting. When we integrate the web data warehouse construction, data mining, and OLAP into the e-commerce system, this tight integration dramatically reduces the time and effort for web usage mining, business intelligence reporting, and mining deployment. Our data warehouse/OLAP framework consists of four phases: data capture, webhouse construction (clickstream marts), pattern discovery and cube construction, and pattern evaluation and deployment. We discuss data transformation operations for web usage mining and business reporting in clickstream, session, and customer levels; describe the problems and challenging issues in each phase in detail; provide plausible solutions to the issues; and demonstrate the framework with some examples from some real web sites. Our data warehouse/OLAP framework has been integrated into some commercial e-commerce systems. We believe this data warehouse/OLAP framework would be very useful for developing any real-world web usage mining and business intelligence reporting systems. © 2004 Wiley Periodicals, Inc. [source]


    Mass defect filter technique and its applications to drug metabolite identification by high-resolution mass spectrometry

    JOURNAL OF MASS SPECTROMETRY (INCORP BIOLOGICAL MASS SPECTROMETRY), Issue 7 2009
    Haiying Zhang
    Abstract Identification of drug metabolites by liquid chromatography/mass spectrometry (LC/MS) involves metabolite detection in biological matrixes and structural characterization based on product ion spectra. Traditionally, metabolite detection is accomplished primarily on the basis of predicted molecular masses or fragmentation patterns of metabolites using triple-quadrupole and ion trap mass spectrometers. Recently, a novel mass defect filter (MDF) technique has been developed, which enables high-resolution mass spectrometers to be utilized for detecting both predicted and unexpected drug metabolites based on narrow, well-defined mass defect ranges for these metabolites. This is a new approach that is completely different from, but complementary to, traditional molecular mass- or MS/MS fragmentation-based LC/MS approaches. This article reviews the mass defect patterns of various classes of drug metabolites and the basic principles of the MDF approach. Examples are given on the applications of the MDF technique to the detection of stable and chemically reactive metabolites in vitro and in vivo. Advantages, limitations, and future applications are also discussed on MDF and its combinations with other data mining techniques for the detection and identification of drug metabolites. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Information and communication technology for process management in healthcare: a contribution to change the culture of blame

    JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE, Issue 6-7 2010
    Silvana Quaglini
    Abstract Statistics on medical errors and their consequences has astonished, during the previous years, both healthcare professionals and ordinary people. Mass-media are becoming more and more sensitive to medical malpractices. This paper elaborates on the well-known resistance of the medical world to disclose actions and processes that could have caused some damages; it illustrates the possible causes of medical errors and, for some of them, it suggests solutions based on information and communication technology. In particular, careflow management systems and process mining techniques are proposed as a means to improve the healthcare delivery process: the former by facilitating task assignments and resource management, the latter by discovering not only individuals' errors, but also the chains of responsibilities concurring to produce errors in a complex patient's pathway. Both supervised and unsupervised process mining will be addressed. The former compares real processes with a known process model (e.g., a clinical practice guideline or a medical protocol), whereas the latter mines processes from raw data, without imposing any model. The potentiality of these techniques is illustrated by means of examples from stroke patient management. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Evaluating uses of data mining techniques in propensity score estimation: a simulation study,

    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, Issue 6 2008
    DrPH, Soko Setoguchi MD
    Abstract Background In propensity score modeling, it is a standard practice to optimize the prediction of exposure status based on the covariate information. In a simulation study, we examined in what situations analyses based on various types of exposure propensity score (EPS) models using data mining techniques such as recursive partitioning (RP) and neural networks (NN) produce unbiased and/or efficient results. Method We simulated data for a hypothetical cohort study (n,=,2000) with a binary exposure/outcome and 10 binary/continuous covariates with seven scenarios differing by non-linear and/or non-additive associations between exposure and covariates. EPS models used logistic regression (LR) (all possible main effects), RP1 (without pruning), RP2 (with pruning), and NN. We calculated c-statistics (C), standard errors (SE), and bias of exposure-effect estimates from outcome models for the PS-matched dataset. Results Data mining techniques yielded higher C than LR (mean: NN, 0.86; RPI, 0.79; RP2, 0.72; and LR, 0.76). SE tended to be greater in models with higher C. Overall bias was small for each strategy, although NN estimates tended to be the least biased. C was not correlated with the magnitude of bias (correlation coefficient [COR],=,,0.3, p,=,0.1) but increased SE (COR,=,0.7, p,<,0.001). Conclusions Effect estimates from EPS models by simple LR were generally robust. NN models generally provided the least numerically biased estimates. C was not associated with the magnitude of bias but was with the increased SE. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    Comparing data mining methods on the VAERS database,

    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, Issue 9 2005
    David Banks PhD
    Abstract Purpose Data mining may enhance traditional surveillance of vaccine adverse events by identifying events that are reported more commonly after administering one vaccine than other vaccines. Data mining methods find signals as the proportion of times a condition or group of conditions is reported soon after the administration of a vaccine; thus it is a relative proportion compared across vaccines, and not an absolute rate for the condition. The Vaccine Adverse Event Reporting System (VAERS) contains approximately 150,000 reports of adverse events that are possibly associated with vaccine administration. Methods We studied four data mining techniques: empirical Bayes geometric mean (EBGM), lower-bound of the EBGM's 90% confidence interval (EB05), proportional reporting ratio (PRR), and screened PRR (SPRR). We applied these to the VAERS database and compared the agreement among methods and other performance properties, particularly focusing on the vaccine,event combinations with the highest numerical scores in the various methods. Results The vaccine,event combinations with the highest numerical scores varied substantially among the methods. Not all combinations representing known associations appeared in the top 100 vaccine,event pairs for all methods. Conclusions The four methods differ in their ranking of vaccine,COSTART pairs. A given method may be superior in certain situations but inferior in others. This paper examines the statistical relationships among the four estimators. Determining which method is best for public health will require additional analysis that focuses on the true alarm and false alarm rates using known vaccine,event associations. Evaluating the properties of these data mining methods will help determine the value of such methods in vaccine safety surveillance. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    The role of bioinformatics in two-dimensional gel electrophoresis

    PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 8 2003
    Andrew W. Dowsey
    Abstract Over the last two decades, two-dimensional electrophoresis (2-DE) gel has established itself as the de facto approach to separating proteins from cell and tissue samples. Due to the sheer volume of data and its experimental geometric and expression uncertainties, quantitative analysis of these data with image processing and modelling has become an actively pursued research topic. The results of these analyses include accurate protein quantification, isoelectric point and relative molecular mass estimation, and the detection of differential expression between samples run on different gels. Systematic errors such as current leakage and regional expression inhomogeneities are corrected for, followed by each protein spot in the gel being segmented and modelled for quantification. To assess differential expression of protein spots in different samples run on a series of two-dimensional gels, a number of image registration techniques for correcting geometric distortion have been proposed. This paper provides a comprehensive review of the computation techniques used in the analysis of 2-DE gels, together with a discussion of current and future trends in large scale analysis. We examine the pitfalls of existing techniques and highlight some of the key areas that need to be developed in the coming years, especially those related to statistical approaches based on multiple gel runs and image mining techniques through the use of parallel processing based on cluster computing and the grid technology. [source]


    Probing genetic algorithms for feature selection in comprehensive metabolic profiling approach

    RAPID COMMUNICATIONS IN MASS SPECTROMETRY, Issue 8 2008
    Wei Zou
    Six different clones of 1-year-old loblolly pine (Pinus taeda L.) seedlings grown under standardized conditions in a green house were used for sample preparation and further analysis. Three independent and complementary analytical techniques for metabolic profiling were applied in the present study: hydrophilic interaction chromatography (HILIC-LC/ESI-MS), reversed-phase liquid chromatography (RP-LC/ESI-MS), and gas chromatography all coupled to mass spectrometry (GC/TOF-MS). Unsupervised methods, such as principle component analysis (PCA) and clustering, and supervised methods, such as classification, were used for data mining. Genetic algorithms (GA), a multivariate approach, was probed for selection of the smallest subsets of potentially discriminative classifiers. From more than 2000 peaks found in total, small subsets were selected by GA as highly potential classifiers allowing discrimination among six investigated genotypes. Annotated GC/TOF-MS data allowed the generation of a small subset of identified metabolites. LC/ESI-MS data and small subsets require further annotation. The present study demonstrated that combination of comprehensive metabolic profiling and advanced data mining techniques provides a powerful metabolomic approach for biomarker discovery among small molecules. Utilizing GA for feature selection allowed the generation of small subsets of potent classifiers. Copyright © 2008 John Wiley & Sons, Ltd. [source]


    An integrated approach to optimization of Escherichia coli fermentations using historical data

    BIOTECHNOLOGY & BIOENGINEERING, Issue 3 2003
    Matthew C. Coleman
    Abstract Using a fermentation database for Escherichia coli producing green fluorescent protein (GFP), we have implemented a novel three-step optimization method to identify the process input variables most important in modeling the fermentation, as well as the values of those critical input variables that result in an increase in the desired output. In the first step of this algorithm, we use either decision-tree analysis (DTA) or information theoretic subset selection (ITSS) as a database mining technique to identify which process input variables best classify each of the process outputs (maximum cell concentration, maximum product concentration, and productivity) monitored in the experimental fermentations. The second step of the optimization method is to train an artificial neural network (ANN) model of the process input,output data, using the critical inputs identified in the first step. Finally, a hybrid genetic algorithm (hybrid GA), which includes both gradient and stochastic search methods, is used to identify the maximum output modeled by the ANN and the values of the input conditions that result in that maximum. The results of the database mining techniques are compared, both in terms of the inputs selected and the subsequent ANN performance. For the E. coli process used in this study, we identified 6 inputs from the original 13 that resulted in an ANN that best modeled the GFP fluorescence outputs of an independent test set. Values of the six inputs that resulted in a modeled maximum fluorescence were identified by applying a hybrid GA to the ANN model developed. When these conditions were tested in laboratory fermentors, an actual maximum fluorescence of 2.16E6 AU was obtained. The previous high value of fluorescence that was observed was 1.51E6 AU. Thus, this input condition set that was suggested by implementing the proposed optimization scheme on the available historical database increased the maximum fluorescence by 55%. © 2003 Wiley Periodicals, Inc. Biotechnol Bioeng 84: 274,285, 2003. [source]