Machine Learning Techniques (machine + learning_techniques)

Distribution by Scientific Domains


Selected Abstracts


FLOOD STAGE FORECASTING WITH SUPPORT VECTOR MACHINES,

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION, Issue 1 2002
Shie-Yui Liong
ABSTRACT: Machine learning techniques are finding more and more applications in the field of forecasting. A novel regression technique, called Support Vector Machine (SVM), based on the statistical learning theory is explored in this study. SVM is based on the principle of Structural Risk Minimization as opposed to the principle of Empirical Risk Minimization espoused by conventional regression techniques. The flood data at Dhaka, Bangladesh, are used in this study to demonstrate the forecasting capabilities of SVM. The result is compared with that of Artificial Neural Network (ANN) based model for one-lead day to seven-lead day forecasting. The improvements in maximum predicted water level errors by SVM over ANN for four-lead day to seven-lead day are 9.6 cm, 22.6 cm, 4.9 cm and 15.7 cm, respectively. The result shows that the prediction accuracy of SVM is at least as good as and in some cases (particularly at higher lead days) actually better than that of ANN, yet it offers advantages over many of the limitations of ANN, for example in arriving at ANN's optimal network architecture and choosing useful training set. Thus, SVM appears to be a very promising prediction tool. [source]


Financial decision support using neural networks and support vector machines

EXPERT SYSTEMS, Issue 4 2008
Chih-Fong Tsai
Abstract: Bankruptcy prediction and credit scoring are the two important problems facing financial decision support. The multilayer perceptron (MLP) network has shown its applicability to these problems and its performance is usually superior to those of other traditional statistical models. Support vector machines (SVMs) are the core machine learning techniques and have been used to compare with MLP as the benchmark. However, the performance of SVMs is not fully understood in the literature because an insufficient number of data sets is considered and different kernel functions are used to train the SVMs. In this paper, four public data sets are used. In particular, three different sizes of training and testing data in each of the four data sets are considered (i.e. 3:7, 1:1 and 7:3) in order to examine and fully understand the performance of SVMs. For SVM model construction, the linear, radial basis function and polynomial kernel functions are used to construct the SVMs. Using MLP as the benchmark, the SVM classifier only performs better in one of the four data sets. On the other hand, the prediction results of the MLP and SVM classifiers are not significantly different for the three different sizes of training and testing data. [source]


Automating survey coding by multiclass text categorization techniques

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 14 2003
Daniela Giorgetti
Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches. [source]


Proteomic patterns for classification of ovarian cancer and CTCL serum samples utilizing peak pairs indicative of post-translational modifications

PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 22 2007
Chenwei Liu
Abstract Proteomic patterns as a potential diagnostic technology has been well established for several cancer conditions and other diseases. The use of machine learning techniques such as decision trees, neural networks, genetic algorithms, and other methods has been the basis for pattern determination. Cancer is known to involve signaling pathways that are regulated through PTM of proteins. These modifications are also detectable with high confidence using high-resolution MS. We generated data using a prOTOFÔ mass spectrometer on two sets of patient samples: ovarian cancer and cutaneous t-cell lymphoma (CTCL) with matched normal samples for each disease. Using the knowledge of mass shifts caused by common modifications, we built models using peak pairs and compared this to a conventional technique using individual peaks. The results for each disease showed that a small number of peak pairs gave classification equal to or better than the conventional technique that used multiple individual peaks. This simple peak picking technique could be used to guide identification of important peak pairs involved in the disease process. [source]