Machine Learning Approaches (machine + learning_approach)

Distribution by Scientific Domains


Selected Abstracts


Machine learning approaches for prediction of linear B-cell epitopes on proteins

JOURNAL OF MOLECULAR RECOGNITION, Issue 3 2006
Johannes Söllner
Abstract Identification and characterization of antigenic determinants on proteins has received considerable attention utilizing both, experimental as well as computational methods. For computational routines mostly structural as well as physicochemical parameters have been utilized for predicting the antigenic propensity of protein sites. However, the performance of computational routines has been low when compared to experimental alternatives. Here we describe the construction of machine learning based classifiers to enhance the prediction quality for identifying linear B-cell epitopes on proteins. Our approach combines several parameters previously associated with antigenicity, and includes novel parameters based on frequencies of amino acids and amino acid neighborhood propensities. We utilized machine learning algorithms for deriving antigenicity classification functions assigning antigenic propensities to each amino acid of a given protein sequence. We compared the prediction quality of the novel classifiers with respect to established routines for epitope scoring, and tested prediction accuracy on experimental data available for HIV proteins. The major finding is that machine learning classifiers clearly outperform the reference classification systems on the HIV epitope validation set. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Machine learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins

JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 11 2007
H. Li
Abstract Computational methods for predicting compounds of specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) property are useful for facilitating drug discovery and evaluation. Recently, machine learning methods such as neural networks and support vector machines have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic and ADMET property. These methods are particularly useful for compounds of diverse structures to complement QSAR methods, and for cases of unavailable receptor 3D structure to complement structure-based methods. A number of studies have demonstrated the potential of these methods for predicting such compounds as substrates of P-glycoprotein and cytochrome P450 CYP isoenzymes, inhibitors of protein kinases and CYP isoenzymes, and agonists of serotonin receptor and estrogen receptor. This article is intended to review the strategies, current progresses and underlying difficulties in using machine learning methods for predicting these protein binders and as potential virtual screening tools. Algorithms for proper representation of the structural and physicochemical properties of compounds are also evaluated. © 2007 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 96: 2838,2860, 2007 [source]


Learning-based 3D face detection using geometric context

COMPUTER ANIMATION AND VIRTUAL WORLDS (PREV: JNL OF VISUALISATION & COMPUTER ANIMATION), Issue 4-5 2007
Yanwen Guo
Abstract In computer graphics community, face model is one of the most useful entities. The automatic detection of 3D face model has special significance to computer graphics, vision, and human-computer interaction. However, few methods have been dedicated to this task. This paper proposes a machine learning approach for fully automatic 3D face detection. To exploit the facial features, we introduce geometric context, a novel shape descriptor which can compactly encode the distribution of local geometry and can be evaluated efficiently by using a new volume encoding form, named integral volume. Geometric contexts over 3D face offer the rich and discriminative representation of facial shapes and hence are quite suitable to classification. We adopt an AdaBoost learning algorithm to select the most effective geometric context-based classifiers and to combine them into a strong classifier. Given an arbitrary 3D model, our method first identifies the symmetric parts as candidates with a new reflective symmetry detection algorithm. Then uses the learned classifier to judge whether the face part exists. Experiments are performed on a large set of 3D face and non-face models and the results demonstrate high performance of our method. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Finding nuggets in documents: A machine learning approach

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 6 2006
Yi-fang Brook Wu
Document keyphrases provide a concise summary of a document's content, offering semantic metadata summarizing a document. They can be used in many applications related to knowledge management and text mining, such as automatic text summarization, development of search engines, document clustering, document classification, thesaurus construction, and browsing interfaces. Because only a small portion of documents have keyphrases assigned by authors, and it is time-consuming and costly to manually assign keyphrases to documents, it is necessary to develop an algorithm to automatically generate keyphrases for documents. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified phrases to assign weights to the candidate keyphrases. The logic of our algorithm is: The more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. KIP's learning function can enrich the glossary database by automatically adding new identified keyphrases to the database. KIP's personalization feature will let the user build a glossary database specifically suitable for the area of his/her interest. The evaluation results show that KIP's performance is better than the systems we compared to and that the learning function is effective. [source]


Genome-wide association analyses of expression phenotypes

GENETIC EPIDEMIOLOGY, Issue S1 2007
Gary K. Chen
Abstract A number of issues arise when analyzing the large amount of data from high-throughput genotype and expression microarray experiments, including design and interpretation of genome-wide association studies of expression phenotypes. These issues were considered by contributions submitted to Group 1 of the Genetic Analysis Workshop 15 (GAW15), which focused on the association of quantitative expression data. These contributions evaluated diverse hypotheses, including those relevant to cancer and obesity research, and used various analytic techniques, many of which were derived from information theory. Several observations from these reports stand out. First, one needs to consider the genetic model of the trait of interest and carefully select which single nucleotide polymorphisms and individuals are included early in the design stage of a study. Second, by targeting specific pathways when analyzing genome-wide data, one can generate more interpretable results than agnostic approaches. Finally, for datasets with small sample sizes but a large number of features like the Genetic Analysis Workshop 15 dataset, machine learning approaches may be more practical than traditional parametric approaches. Genet Epidemiol 31 (Suppl. 1): S7,S11, 2007. © 2007 Wiley-Liss, Inc. [source]


Adaptive modeling and discovery in bioinformatics: The evolving connectionist approach

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 5 2008
Nikola Kasabov
Most biological processes that are currently being researched in bioinformatics are complex, dynamic processes that are difficult to model and understand. The paper presents evolving connectionist systems (ECOS) as a general approach to adaptive modeling and knowledge discovery in bioinformatics. This approach extends the traditional machine learning approaches with various adaptive learning and rule extraction procedures. ECOS belong to the class of incremental local learning and knowledge-based neural networks. They are applied here to challenging problems in Bioinformatics, such as: microarray gene expression profiling, gene regulatory network (GRN) modeling, computational neurogenetic modeling. The ECOS models have several advantages when compared to the traditional techniques: fast learning, incremental adaptation to new data, facilitating knowledge discovery through fuzzy rules. © 2008 Wiley Periodicals, Inc. [source]


Predicting project delivery rates using the Naive,Bayes classifier

JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE, Issue 3 2002
B. Stewart
Abstract The importance of accurate estimation of software development effort is well recognized in software engineering. In recent years, machine learning approaches have been studied as possible alternatives to more traditional software cost estimation methods. The objective of this paper is to investigate the utility of the machine learning algorithm known as the Naive,Bayes classifier for estimating software project effort. We present empirical experiments with the Benchmark 6 data set from the International Software Benchmarking Standards Group to estimate project delivery rates and compare the performance of the Naive,Bayes approach to two other machine learning methods,model trees and neural networks. A project delivery rate is defined as the number of effort hours per function point. The approach described is general and can be used to analyse not only software development data but also data on software maintenance and other types of software engineering. The paper demonstrates that the Naive,Bayes classifier has a potential to be used as an alternative machine learning tool for software development effort estimation. Copyright © 2002 John Wiley & Sons, Ltd. [source]