Vector Machines (vector + machines)

Distribution by Scientific Domains

Kinds of Vector Machines

  • support vector machines


  • Selected Abstracts


    FLOOD STAGE FORECASTING WITH SUPPORT VECTOR MACHINES,

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION, Issue 1 2002
    Shie-Yui Liong
    ABSTRACT: Machine learning techniques are finding more and more applications in the field of forecasting. A novel regression technique, called Support Vector Machine (SVM), based on the statistical learning theory is explored in this study. SVM is based on the principle of Structural Risk Minimization as opposed to the principle of Empirical Risk Minimization espoused by conventional regression techniques. The flood data at Dhaka, Bangladesh, are used in this study to demonstrate the forecasting capabilities of SVM. The result is compared with that of Artificial Neural Network (ANN) based model for one-lead day to seven-lead day forecasting. The improvements in maximum predicted water level errors by SVM over ANN for four-lead day to seven-lead day are 9.6 cm, 22.6 cm, 4.9 cm and 15.7 cm, respectively. The result shows that the prediction accuracy of SVM is at least as good as and in some cases (particularly at higher lead days) actually better than that of ANN, yet it offers advantages over many of the limitations of ANN, for example in arriving at ANN's optimal network architecture and choosing useful training set. Thus, SVM appears to be a very promising prediction tool. [source]


    SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS

    COMPUTATIONAL INTELLIGENCE, Issue 2 2006
    Alistair Kennedy
    We present two methods for determining the sentiment expressed by a movie review. The semantic orientation of a review can be positive, negative, or neutral. We examine the effect of valence shifters on classifying the reviews. We examine three types of valence shifters: negations, intensifiers, and diminishers. Negations are used to reverse the semantic polarity of a particular term, while intensifiers and diminishers are used to increase and decrease, respectively, the degree to which a term is positive or negative. The first method classifies reviews based on the number of positive and negative terms they contain. We use the General Inquirer to identify positive and negative terms, as well as negation terms, intensifiers, and diminishers. We also use positive and negative terms from other sources, including a dictionary of synonym differences and a very large Web corpus. To compute corpus-based semantic orientation values of terms, we use their association scores with a small group of positive and negative terms. We show that extending the term-counting method with contextual valence shifters improves the accuracy of the classification. The second method uses a Machine Learning algorithm, Support Vector Machines. We start with unigram features and then add bigrams that consist of a valence shifter and another word. The accuracy of classification is very high, and the valence shifter bigrams slightly improve it. The features that contribute to the high accuracy are the words in the lists of positive and negative terms. Previous work focused on either the term-counting method or the Machine Learning method. We show that combining the two methods achieves better results than either method alone. [source]


    Glass analysis for forensic purposes,a comparison of classification methods

    JOURNAL OF CHEMOMETRICS, Issue 5-6 2007
    Grzegorz Zadora
    Abstract One of the purposes of the chemical analysis of glass fragments (pieces of glass of linear dimension ca. 0.5,mm) for forensic purposes is a classification of those fragments into use categories, for example windows, car headlights and containers. The object of this research was to check the efficiency of Naïve Bayes Classifiers (NBCs) and Support Vector Machines (SVMs) to the application of the classification of glass objects when those objects may be described by the major and minor elemental concentrations obtained by Scanning Electron Microscopy coupled with an Energy Dispersive X-ray spectrometer which is routinely used in many forensic institutes. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Insight into the Bioactivity and Metabolism of Human Glucagon Receptor Antagonists from 3D-QSAR Analyses

    MOLECULAR INFORMATICS, Issue 8 2004
    HaiFeng Chen
    Abstract Descriptors, such as logP, the number of hydrogen bond donors, the number of hydrogen bond acceptors, highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) combined with fields of CoMFA and CoMSIA to construct models for hyperglycemia decrease activity and metabolism of human glucagon receptor antagonists. The results reveal that including logP, HOMO and LUMO energies is meaningful for QSAR/QSMR model. The models were validated by using a test set of structural diverse compounds that had not been included in the CoMFA and CoMSIA models. Support Vector Machines (SVM) have been used to select the suitable additional descriptors to construct 3D-QSAR/QSMR models. A key factor to mention is that activity and metabolism models simultaneously. These in silico ADME models are helpful in making quantitative prediction of inhibitory activities and rates of metabolism before resorting in vitro and in vivo experimentation. [source]


    Epothilones: Quantitative Structure Activity Relations Studied by Support Vector Machines and Artificial Neural Networks

    MOLECULAR INFORMATICS, Issue 7 2003
    Annalen Bleckmann
    Abstract In this paper the relation between the structure of epothilones (a new class of anti-tumour agents) and their potential to influence the tubulin-microtubule equilibrium is investigated. Insights into the character of the tubulin-epothilone interactions are derived as the accuracy and reliability of support vector machines and artificial neural networks to model such relations quantitatively is compared. Both methods are well qualified to model relationships between the structure of epothilone derivatives and their anti-tumour activities. Artificial neural networks achieve lower residual standard deviations (22%) compared to support vector machines (25%) and better classification results (89% compared to 75%). However, the reproducibility of the results is greater for support vector machines, which suggests a stronger convergence. The mapping of the influence of individual structural descriptors on the three-dimensional epothilone structure suggests one side of the rather flat molecule to be more important for its activity. The "LIBSVM" software which is used for simulating the support vector machines is freely available from www.csie.ntu.edu.tw/~cjlin/libsvm. The Program "Smart" which is used for simulating artificial neural networks is free for academic use and can be obtained together with the database of epothilones and their activities from www.jens-meiler.de. [source]


    Using support vector machines for automatic new topic identification

    PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE & TECHNOLOGY (ELECTRONIC), Issue 1 2007
    Seda Ozmutlu
    Recent studies on automatic new topic identification in Web search engine user sessions demonstrated that learning algorithms such as neural networks and regression have been fairly successful in automatic new topic identification. In this study, we investigate whether another learning algorithm, Support Vector Machines (SVM) are successful in terms of identifying topic shifts and continuations. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that support vector machines' performance depends on the characteristics of the dataset it is applied on. [source]


    Insight into the Metabolism Rate of Quinone Analogues from Molecular Dynamics Simulation and 3D-QSMR Methods

    CHEMICAL BIOLOGY & DRUG DESIGN, Issue 4 2007
    Hai-Feng Chen
    Molecular dynamics simulation was applied to investigate the metabolism mechanism for quinone analogues. Favourable hydrogen bonds between ligand and NQO1, and parallel orientation between ligand and flavin adenine dinucleotide could explain the difference of metabolism rate (in ,mol/min/mg) for quinone analogues. This is consistent with the experimental observation (Structure 2001;9:659,667). Then Support Vector Machines was used to construct quantitative structure,metabolism rate model. The model was evaluated by 14 test set compounds. Some descriptors selected by Support Vector Machine, were introduced into standard fields of three-dimensional quantitative structure,metabolism relationship to improve the statistical parameters of three-dimensional quantitative structure,metabolism relationship models. The results show that the inclusion of highest occupied molecular orbital and lowest unoccupied molecular orbital is meaningful for three-dimensional quantitative structure,metabolism relationship models. These in silico absorption, distribution, metabolism and excretion models are helpful in making quantitative prediction of their metabolic rates for new lead compounds before resorting in vitro and in vivo experimentation. [source]


    An automated, sheathless capillary electrophoresis-mass spectrometry platform for discovery of biomarkers in human serum

    ELECTROPHORESIS, Issue 7-8 2005
    Alexander P. Sassi
    Abstract A capillary electrophoresis-mass spectrometry (CE-MS) method has been developed to perform routine, automated analysis of low-molecular-weight peptides in human serum. The method incorporates transient isotachophoresis for in-line preconcentration and a sheathless electrospray interface. To evaluate the performance of the method and demonstrate the utility of the approach, an experiment was designed in which peptides were added to sera from individuals at each of two different concentrations, artificially creating two groups of samples. The CE-MS data from the serum samples were divided into separate training and test sets. A pattern-recognition/feature-selection algorithm based on support vector machines was used to select the mass-to-charge (m/z) values from the training set data that distinguished the two groups of samples from each other. The added peptides were identified correctly as the distinguishing features, and pattern recognition based on these peptides was used to assign each sample in the independent test set to its respective group. A twofold difference in peptide concentration could be detected with statistical significance (p -value < 0.0001). The accuracy of the assignment was 95%, demonstrating the utility of this technique for the discovery of patterns of biomarkers in serum. [source]


    Expansion of cumulant-based classifier to frequency shift keying modulations and to the use of support vector machines

    EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, Issue 1 2008
    H. Mustafa
    This paper proposes an expansion of the cumulant-based classifier of digital modulations to frequency shift keying (FSK) modulations. Cumulant estimates are calculated when the FSK modulation is present. The features obtained from the cumulant estimators are used in a support vector machine (SVM) classifier. The performance of the SVM classifier is compared to other classifiers. Among these other classifiers is the cumulant-based tree classifier which uses thresholds defined by the asymptotic values of the cumulant estimators. The simulation results show that using SVM classifier improves the performance. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Analysis of electrocardiographic changes in partial epileptic patients by combining eigenvector methods and support vector machines

    EXPERT SYSTEMS, Issue 3 2009
    Elif Derya Übeyli
    Abstract: In the present study, the diagnostic accuracy of support vector machines (SVMs) on electrocardiogram (ECG) signals is evaluated. Two types of ECG beats (normal and partial epilepsy) were obtained from the Physiobank database. Decision making was performed in two stages: feature extraction by eigenvector methods and classification using the SVM trained on the extracted features. The present research demonstrates that the power levels of the power spectral densities obtained by eigenvector methods are features which represent the ECG signals well and SVMs trained on these features achieve high classification accuracies. [source]


    Financial decision support using neural networks and support vector machines

    EXPERT SYSTEMS, Issue 4 2008
    Chih-Fong Tsai
    Abstract: Bankruptcy prediction and credit scoring are the two important problems facing financial decision support. The multilayer perceptron (MLP) network has shown its applicability to these problems and its performance is usually superior to those of other traditional statistical models. Support vector machines (SVMs) are the core machine learning techniques and have been used to compare with MLP as the benchmark. However, the performance of SVMs is not fully understood in the literature because an insufficient number of data sets is considered and different kernel functions are used to train the SVMs. In this paper, four public data sets are used. In particular, three different sizes of training and testing data in each of the four data sets are considered (i.e. 3:7, 1:1 and 7:3) in order to examine and fully understand the performance of SVMs. For SVM model construction, the linear, radial basis function and polynomial kernel functions are used to construct the SVMs. Using MLP as the benchmark, the SVM classifier only performs better in one of the four data sets. On the other hand, the prediction results of the MLP and SVM classifiers are not significantly different for the three different sizes of training and testing data. [source]


    Combining wavelet-based feature extractions with relevance vector machines for stock index forecasting

    EXPERT SYSTEMS, Issue 2 2008
    Shian-Chang Huang
    Abstract: The relevance vector machine (RVM) is a Bayesian version of the support vector machine, which with a sparse model representation has appeared to be a powerful tool for time-series forecasting. The RVM has demonstrated better performance over other methods such as neural networks or autoregressive integrated moving average based models. This study proposes a hybrid model that combines wavelet-based feature extractions with RVM models to forecast stock indices. The time series of explanatory variables are decomposed using some wavelet bases and the extracted time-scale features serve as inputs of an RVM to perform the non-parametric regression and forecasting. Compared with traditional forecasting models, our proposed method performs best. The root-mean-squared forecasting errors are significantly reduced. [source]


    Understanding intention of movement from electroencephalograms

    EXPERT SYSTEMS, Issue 5 2007
    Heba Lakany
    Abstract: In this paper, we propose a new framework for understanding intention of movement that can be used in developing non-invasive brain,computer interfaces. The proposed method is based on extracting salient features from brain signals recorded whilst the subject is actually (or imagining) performing a wrist movement in different directions. Our method focuses on analysing the brain signals at the time preceding wrist movement, i.e. while the subject is preparing (or intending) to perform the movement. Feature selection and classification of the direction is done using a wrapper method based on support vector machines (SVMs). The classification results show that we are able to discriminate the directions using features extracted from brain signals prior to movement. We then extract rules from the SVM classifiers to compare the features extracted for real and imaginary movements in an attempt to understand the mechanisms of intention of movement. Our new approach could be potentially useful in building brain,computer interfaces where a paralysed person could communicate with a wheelchair and steer it to the desired direction using a rule-based knowledge system based on understanding of the subject's intention to move through his/her brain signals. [source]


    Using species distribution models to identify suitable areas for biofuel feedstock production

    GCB BIOENERGY, Issue 2 2010
    JASON M. EVANS
    Abstract The 2007 Energy Independence and Security Act mandates a five-fold increase in US biofuel production by 2022. Given this ambitious policy target, there is a need for spatially explicit estimates of landscape suitability for growing biofuel feedstocks. We developed a suitability modeling approach for two major US biofuel crops, corn (Zea mays) and switchgrass (Panicum virgatum), based upon the use of two presence-only species distribution models (SDMs): maximum entropy (Maxent) and support vector machines (SVM). SDMs are commonly used for modeling animal and plant distributions in natural environments, but have rarely been used to develop landscape models for cultivated crops. AUC, Kappa, and correlation measures derived from test data indicate that SVM slightly outperformed Maxent in modeling US corn production, although both models produced significantly accurate results. When compared with results from a mechanistic switchgrass model recently developed by Oak Ridge National Laboratory (ORNL), SVM results showed higher correlation than Maxent results with models fit using county-scale point inputs of switchgrass production derived from expert opinion estimates. However, Maxent results for an alternative switchgrass model developed with point inputs from research trial sites showed higher correlation to the ORNL model than the corresponding results obtained from SVM. Further analysis indicates that both modeling approaches were effective in predicting county-scale increases in corn production from 2006 to 2007, a time period in which US corn production increased by 24%. We conclude that presence-only methods are a powerful first-cut tool for estimating relative land suitability across geographic regions in which candidate biofuel feedstocks can be grown, and may also provide important insight into potential land-use change patterns likely to be associated with increased biofuel demand. [source]


    Explaining qualifications in audit reports using a support vector machine methodology

    INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 4 2005
    Michael Doumpos
    The verification of whether the financial statements of a firm represent its actual position is of major importance for auditors, who should provide a qualified report if they conclude that the financial statements fail to meet this requirement. This paper implements support vector machines (SVMs) to develop models that may support auditors in this task. Linear and non-linear models are developed and their performance is analysed using training samples of different size and out-of-sample/out-of-time data. The results show that all SVM models are capable of distinguishing between qualified and unqualified financial statements with satisfactory accuracy. The performance of the models over time is also explored. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    Support vector machines-based modelling of seismic liquefaction potential

    INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, Issue 10 2006
    Mahesh Pal
    Abstract This paper investigates the potential of support vector machines (SVM)-based classification approach to assess the liquefaction potential from actual standard penetration test (SPT) and cone penetration test (CPT) field data. SVMs are based on statistical learning theory and found to work well in comparison to neural networks in several other applications. Both CPT and SPT field data sets is used with SVMs for predicting the occurrence and non-occurrence of liquefaction based on different input parameter combination. With SPT and CPT test data sets, highest accuracy of 96 and 97%, respectively, was achieved with SVMs. This suggests that SVMs can effectively be used to model the complex relationship between different soil parameter and the liquefaction potential. Several other combinations of input variable were used to assess the influence of different input parameters on liquefaction potential. Proposed approach suggest that neither normalized cone resistance value with CPT data nor the calculation of standardized SPT value is required with SPT data. Further, SVMs required few user-defined parameters and provide better performance in comparison to neural network approach. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Online trained support vector machines-based generalized predictive control of non-linear systems

    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 10 2006
    S. Iplikci
    Abstract In this work, an online support vector machines (SVM) training method (Neural Comput. 2003; 15: 2683,2703), referred to as the accurate online support vector regression (AOSVR) algorithm, is embedded in the previously proposed support vector machines-based generalized predictive control (SVM-Based GPC) architecture (Support vector machines based generalized predictive control, under review), thereby obtaining a powerful scheme for controlling non-linear systems adaptively. Starting with an initially empty SVM model of the unknown plant, the proposed online SVM-based GPC method performs the modelling and control tasks simultaneously. At each iteration, if the SVM model is not accurate enough to represent the plant dynamics at the current operating point, it is updated with the training data formed by persistently exciting random input signal applied to the plant, otherwise, if the model is accepted as accurate, a generalized predictive control signal based on the obtained SVM model is applied to the plant. After a short transient time, the model can satisfactorily reflect the behaviour of the plant in the whole phase space or operation region. The incremental algorithm of AOSVR enables the SVM model to learn the new training data pair, while the decremental algorithm allows the SVM model to forget the oldest training point. Thus, the SVM model can adapt the changes in the plant and also in the operating conditions. The simulation results on non-linear systems have revealed that the proposed method provides an excellent control quality. Furthermore, it maintains its performance when a measurement noise is added to the output of the underlying system. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    A kernel-based core growing clustering method

    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 4 2009
    T. W. Hsieh
    In this paper, a novel clustering method in the kernel space is proposed. It effectively integrates several existing algorithms to become an iterative clustering scheme, which can handle clusters with arbitrary shapes. In our proposed approach, a reasonable initial core for each of the cluster is estimated. This allows us to adopt a cluster growing technique, and the growing cores offer partial hints on the cluster association. Consequently, the methods used for classification, such as support vector machines (SVMs), can be useful in our approach. To obtain initial clusters effectively, the notion of the incomplete Cholesky decomposition is adopted so that the fuzzy c-means (FCM) can be used to partition the data in a kernel defined-like space. Then a one-class and a multiclass soft margin SVMs are adopted to detect the data within the main distributions (the cores) of the clusters and to repartition the data into new clusters iteratively. The structure of the data set is explored by pruning the data in the low-density region of the clusters. Then data are gradually added back to the main distributions to assure exact cluster boundaries. Unlike the ordinary SVM algorithm, whose performance relies heavily on the kernel parameters given by the user, the parameters are estimated from the data set naturally in our approach. The experimental evaluations on two synthetic data sets and four University of California Irvine real data benchmarks indicate that the proposed algorithms outperform several popular clustering algorithms, such as FCM, support vector clustering (SVC), hierarchical clustering (HC), self-organizing maps (SOM), and non-Euclidean norm fuzzy c-means (NEFCM). © 2009 Wiley Periodicals, Inc.4 [source]


    Support vector design of the microstrip lines

    INTERNATIONAL JOURNAL OF RF AND MICROWAVE COMPUTER-AIDED ENGINEERING, Issue 4 2008
    Filiz Güne
    Abstract In this article, the support vector regression is adapted to the analysis and synthesis of microstrip lines on all isotropic/anisotropic dielectric materials, which is a novel technique based on the rigorous mathematical fundamentals and the most competitive technique to the popular artificial neural networks (ANN). In this design process, accuracy, computational efficiency and number of support vectors are investigated in detail and the support vector regression performance is compared with an ANN performance. It can be concluded that the ANN may be replaced by the support vector machines in the regression applications because of its higher approximation capability and much faster convergence rate with the sparse solution technique. Synthesis is achieved by utilizing the analysis black-box bidirectionally by reverse training. Furthermore, by using the adaptive step size, a much faster convergence rate is obtained in the reverse training. Besides, design of microstrip lines on the most commonly used isotropic/anisotropic dielectric materials are given as the worked examples. © 2008 Wiley Periodicals, Inc. Int J RF and Microwave CAE, 2008. [source]


    Signal-noise support vector model of a microwave transistor

    INTERNATIONAL JOURNAL OF RF AND MICROWAVE COMPUTER-AIDED ENGINEERING, Issue 4 2007
    Filiz Güne
    Abstract In this work, a support vector machines (SVM) model for the small-signal and noise behaviors of a microwave transistor is presented and compared with its artificial neural network (ANN) model. Convex optimization and generalization properties of SVM are applied to the black-box modeling of a microwave transistor. It has been shown that SVM has a high potential of accurate and efficient device modeling. This is verified by giving a worked example as compared with ANN which is another commonly used modeling technique. It can be concluded that hereafter SVM modeling is a strongly competitive approach against ANN modeling. © 2007 Wiley Periodicals, Inc. Int J RF and Microwave CAE, 2007. [source]


    A comparative study on a novel model-based PID tuning and control mechanism for nonlinear systems

    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, Issue 13 2010
    S. Iplikci
    Abstract This work presents a novel predictive model-based proportional integral derivative (PID) tuning and control approach for unknown nonlinear systems. For this purpose, an NARX model of the plant to be controlled is obtained and then it used for both PID tuning and correction of the control action. In this study, for comparison, neural networks (NNs) and support vector machines (SVMs) have been used for modeling. The proposed structure has been tested on two highly nonlinear systems via simulations by comparing control and convergence performances of SVM- and NN-Based PID controllers. The simulation results have shown that when used in the proposed scheme, both NN and SVM approaches provide rapid parameter convergence and considerably high control performance by yielding very small transient- and steady-state tracking errors. Moreover, they can maintain their control performances under noisy conditions, while convergence properties are deteriorated to some extent due to the measurement noises. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Support vector machines-based generalized predictive control

    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, Issue 17 2006
    S. Iplikci
    Abstract In this study, we propose a novel control methodology that introduces the use of support vector machines (SVMs) in the generalized predictive control (GPC) scheme. The SVM regression algorithms have extensively been used for modelling nonlinear systems due to their assurance of global solution, which is achieved by transforming the regression problem into a convex optimization problem in dual space, and also their higher generalization potential. These key features of the SVM structures lead us to the idea of employing a SVM model of an unknown plant within the GPC context. In particular, the SVM model can be employed to obtain gradient information and also it can predict future trajectory of the plant output, which are needed in the cost function minimization block. Simulations have confirmed that proposed SVM-based GPC scheme can provide a noticeably high control performance, in other words, an unknown nonlinear plant controlled by SVM-based GPC can accurately track the reference inputs with different shapes. Moreover, the proposed SVM-based GPC scheme maintains its control performance under noisy conditions. Copyright © 2006 John Wiley & Sons, Ltd. [source]


    Prediction model for increasing propylene from FCC gasoline secondary reactions based on Levenberg,Marquardt algorithm coupled with support vector machines

    JOURNAL OF CHEMOMETRICS, Issue 9 2010
    Xiaowei Zhou
    Abstract Levenberg,Marquardt (LM) algorithm was adopted to optimize the multiple parameters of the support vector machines (SVM) model to overcome the difficulty in selecting the parameters of SVM and to fit relational expression of high nonlinearity. Strategy of dividing the training data into working data to train SVM and the testing data so as to avoid over-fitting was performed. Comparison of the proposed LM/SVM method with three reported hybridized SVM approaches (GA/SVM, SM/SVM and SQP/SVM) was also carried out. The new method was applied in modelling for the prediction of propylene by secondary reactions of FCC gasoline. Best performance of LM/SVM employing polynomial kernel was demonstrated. Good agreement between predicted results and experimental data suggests that the LM/SVM method is successfully developed and the SVM model for increasing propylene is well established. Finally, sequential quadratic programming (SQP) algorithm was employed to optimize the operation conditions of FCC gasoline secondary reaction for maximizing the propylene yield. The obtained optimization conditions are consistent with experimental data and reported results, indicating that the optimization results are reliable. Copyright © 2010 John Wiley & Sons, Ltd. [source]


    Modeling and predicting binding affinity of phencyclidine-like compounds using machine learning methods

    JOURNAL OF CHEMOMETRICS, Issue 1 2010
    Ozlem Erdas
    Abstract Machine learning methods have always been promising in the science and engineering fields, and the use of these methods in chemistry and drug design has advanced especially since the 1990s. In this study, molecular electrostatic potential (MEP) surfaces of phencyclidine-like (PCP-like) compounds are modeled and visualized in order to extract features that are useful in predicting binding affinities. In modeling, the Cartesian coordinates of MEP surface points are mapped onto a spherical self-organizing map (SSOM). The resulting maps are visualized using electrostatic potential (ESP) values. These values also provide features for a prediction system. Support vector machines and partial least-squares method are used for predicting binding affinities of compounds. Copyright © 2009 John Wiley & Sons, Ltd. [source]


    Optimizing the tuning parameters of least squares support vector machines regression for NIR spectra

    JOURNAL OF CHEMOMETRICS, Issue 5 2006
    T. Coen
    Abstract Partial least squares (PLS) is one of the most used tools in chemometrics. Other data analysis techniques such as artificial neural networks and least squares support vector machines (LS-SVMs) have however made their entry in the field of chemometrics. These techniques can also model nonlinear relations, but the presence of tuning parameters is a serious drawback. These parameters balance the risk of overfitting with the possibility to model the underlying nonlinear relation. In this work a methodology is proposed to initialize and optimize those tuning parameters for LS-SVMs with radial basis function (RBF)-kernel based on a statistical interpretation. In this way, these methods become much more appealing for new users. The presented methods are applied on manure spectra. Although this dataset is only slightly nonlinear, good results were obtained. Copyright © 2007 John Wiley & Sons, Ltd. [source]


    Combination of support vector machines (SVM) and near-infrared (NIR) imaging spectroscopy for the detection of meat and bone meal (MBM) in compound feeds

    JOURNAL OF CHEMOMETRICS, Issue 7-8 2004
    J. A. Fernández Pierna
    Abstract This study concerns the development of a new system to detect meat and bone meal (MBM) in compound feeds, which will be used to enforce legislation concerning feedstuffs enacted after the European mad cow crisis. Focal plane array near-infrared (NIR) imaging spectroscopy, which collects thousands of spatially resolved spectra in a massively parallel fashion, has been suggested as a more efficient alternative to the current methods, which are tedious and require significant expert human analysis. Chemometric classification strategies have been applied to automate the method and reduce the need for constant expert analysis of the data. In this work the performance of a new method for multivariate classification, support vector machines (SVM), was compared with that of two classical chemometric methods, partial least squares (PLS) and artificial neural networks (ANN), in classifying feed particles as either MBM or vegetal using the spectra from NIR images. While all three methods were able to effectively model the data, SVM was found to perform substantially better than PLS and ANN, exhibiting a much lower rate of false positive detection. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    Active learning support vector machines for optimal sample selection in classification

    JOURNAL OF CHEMOMETRICS, Issue 6 2004
    Simeone Zomer
    Abstract Labelling samples is a procedure that may result in significant delays particularly when dealing with larger datasets and/or when labelling implies prolonged analysis. In such cases a strategy that allows the construction of a reliable classifier on the basis of a minimal sized training set by labelling a minor fraction of samples can be of advantage. Support vector machines (SVMs) are ideal for such an approach because the classifier relies on only a small subset of samples, namely the support vectors, while being independent from the remaining ones that typically form the majority of the dataset. This paper describes a procedure where a SVM classifier is constructed with support vectors systematically retrieved from the pool of unlabelled samples. The procedure is termed ,active' because the algorithm interacts with the samples prior to their labelling rather than waiting passively for the input. The learning behaviour on simulated datasets is analysed and a practical application for the detection of hydrocarbons in soils using mass spectrometry is described. Results on simulations show that the active learning SVM performs optimally on datasets where the classes display an intermediate level of separation. On the real case study the classifier correctly assesses the membership of all samples in the original dataset by requiring for labelling around 14% of the data. Its subsequent application on a second dataset of analogous nature also provides perfect classification without further labelling, giving the same outcome as most classical techniques based on the entirely labelled original dataset. Copyright © 2004 John Wiley & Sons, Ltd. [source]


    Identification of small molecule aggregators from large compound libraries by support vector machines

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 4 2010
    Hanbing Rao
    Abstract Small molecule aggregators non-specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high-throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non-aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross-validation, which showed comparable aggregator and significantly improved non-aggregator identification rates against earlier studies. The second is the independent test of 17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non-aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1.14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross-validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false-hit rates. © 2009 Wiley Periodicals, Inc.J Comput Chem, 2010 [source]


    Multiple classifier integration for the prediction of protein structural classes

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 14 2009
    Lei Chen
    Abstract Supervised classifiers, such as artificial neural network, partition trees, and support vector machines, are often used for the prediction and analysis of biological data. However, choosing an appropriate classifier is not straightforward because each classifier has its own strengths and weaknesses, and each biological dataset has its own characteristics. By integrating many classifiers together, people can avoid the dilemma of choosing an individual classifier out of many to achieve an optimized classification results (Rahman et al., Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variation, Springer, Berlin, 2002, 167,178). The classification algorithms come from Weka (Witten and Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2005) (a collection of software tools for machine learning algorithms). By integrating many predictors (classifiers) together through simple voting, the correct prediction (classification) rates are 65.21% and 65.63% for a basic training dataset and an independent test set, respectively. These results are better than any single machine learning algorithm collected in Weka when exactly the same data are used. Furthermore, we introduce an integration strategy which takes care of both classifier weightings and classifier redundancy. A feature selection strategy, called minimum redundancy maximum relevance (mRMR), is transferred into algorithm selection to deal with classifier redundancy in this research, and the weightings are based on the performance of each classifier. The best classification results are obtained when 11 algorithms are selected by mRMR method, and integrated together through majority votes with weightings. As a result, the prediction correct rates are 68.56% and 69.29% for the basic training dataset and the independent test dataset, respectively. The web-server is available at http://chemdata.shu.edu.cn/protein_st/. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009 [source]


    Using support vector machines for prediction of protein structural classes based on discrete wavelet transform

    JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 8 2009
    Jian-Ding Qiu
    Abstract The prediction of secondary structure is a fundamental and important component in the analytical study of protein structure and functions. How to improve the predictive accuracy of protein structural classification by effectively incorporating the sequence-order effects is an important and challenging problem. In this study, a new method, in which the support vector machine combines with discrete wavelet transform, is developed to predict the protein structural classes. Its performance is assessed by cross-validation tests. The predicted results show that the proposed approach can remarkably improve the success rates, and might become a useful tool for predicting the other attributes of proteins as well. © 2008 Wiley Periodicals, Inc. J Comput Chem 2009 [source]