Classification Algorithms (classification + algorithms)

Distribution by Scientific Domains

Selected Abstracts

Extracting new patterns for cardiovascular disease prognosis

EXPERT SYSTEMS, Issue 5 2009
Luis Mena
Abstract: Cardiovascular diseases constitute one of the main causes of mortality in the world, and machine learning has become a powerful tool for analysing medical data in the last few years. In this paper we present an interdisciplinary work based on an ambulatory blood pressure study and the development of a new classification algorithm named REMED. We focused on the discovery of new patterns for abnormal blood pressure variability as a possible cardiovascular risk factor. We compared our results with other classification algorithms based on Bayesian methods, decision trees, and rule induction techniques. In the comparison, REMED showed similar accuracy to these algorithms but it has the advantage of being superior in its capacity to classify sick people correctly. Therefore, our method could represent an innovative approach that might be useful in medical decision support for cardiovascular disease prognosis. [source]

Decision-making method using a visual approach for cluster analysis problems; indicative classification algorithms and grouping scope

EXPERT SYSTEMS, Issue 3 2007
Ran M. Bittmann
Abstract: Currently, classifying samples into a fixed number of clusters (i.e. supervised cluster analysis) as well as unsupervised cluster analysis are limited in their ability to support ,cross-algorithms' analysis. It is well known that each cluster analysis algorithm yields different results (i.e. a different classification); even running the same algorithm with two different similarity measures commonly yields different results. Researchers usually choose the preferred algorithm and similarity measure according to analysis objectives and data set features, but they have neither a formal method nor tool that supports comparisons and evaluations of the different classifications that result from the diverse algorithms. Current research development and prototype decisions support a methodology based upon formal quantitative measures and a visual approach, enabling presentation, comparison and evaluation of multiple classification suggestions resulting from diverse algorithms. This methodology and tool were used in two basic scenarios: (I) a classification problem in which a ,true result' is known, using the Fisher iris data set; (II) a classification problem in which there is no ,true result' to compare with. In this case, we used a small data set from a user profile study (a study that tries to relate users to a set of stereotypes based on sociological aspects and interests). In each scenario, ten diverse algorithms were executed. The suggested methodology and decision support system produced a cross-algorithms presentation; all ten resultant classifications are presented together in a ,Tetris-like' format. Each column represents a specific classification algorithm, each line represents a specific sample, and formal quantitative measures analyse the ,Tetris blocks', arranging them according to their best structures, i.e. best classification. [source]

Efficient packet classification on network processors

Koert Vlaeminck
Abstract Always-on networking and a growing interest in multimedia- and conversational-IP services offer an opportunity to network providers to participate in the service layer, if they increase functional intelligence in their networks. An important prerequisite to providing advanced services in IP access networks is the availability of a high-speed packet classification module in the network nodes, necessary for supporting any IP service imaginable. Often, access nodes are installed in remote offices, where they terminate a large number of subscriber lines. As such, technology adding processing power in this environment should be energy-efficient, whilst maintaining the flexibility to cope with changing service requirements. Network processor units (NPUs) are designed to overcome these operational restrictions, and in this context this paper investigates their suitability for wireline and robust packet classification in a firewalling application. State-of-the-art packet classification algorithms are examined, whereafter the performance and memory requirements are compared for a Binary Decision Diagram (BDD) and sequential search approach. Several space optimizations for implementing BDD classifiers on NPU hardware are discussed and it is shown that the optimized BDD classifier is able to operate at gigabit wirespeed, independent of the ruleset size, which is a major advantage over a sequential search classifier. Copyright 2007 John Wiley & Sons, Ltd. [source]

Kernel approach to possibilistic C -means clustering

Frank Chung-Hoon Rhee
Kernel approaches can improve the performance of conventional clustering or classification algorithms for complex distributed data. This is achieved by using a kernel function, which is defined as the inner product of two values obtained by a transformation function. In doing so, this allows algorithms to operate in a higher dimensional space (i.e., more degrees of freedom for data to be meaningfully partitioned) without having to compute the transformation. As a result, the fuzzy kernel C -means (FKCM) algorithm, which uses a distance measure between patterns and cluster prototypes based on a kernel function, can obtain more desirable clustering results than fuzzy C -means (FCM) for not only spherical data but also nonspherical data. However, it can still be sensitive to noise as in the FCM algorithm. In this paper, to improve the drawback of FKCM, we propose a kernel possibilistic C -means (KPCM) algorithm that applies the kernel approach to the possibilistic C -means (PCM) algorithm. The method includes a variance updating method for Gaussian kernels for each clustering iteration. Several experimental results show that the proposed algorithm can outperform other algorithms for general data with additive noise. 2009 Wiley Periodicals, Inc. [source]

Growing decision trees in an ordinal setting

Kim Cao-Van
Although ranking (ordinal classification/regression) based on criteria is related closely to classification based on attributes, the development of methods for learning a ranking on the basis of data is lagging far behind that for learning a classification. Most of the work being done focuses on maintaining monotonicity (sometimes even only on the training set). We argue that in doing so, an essential aspect is mostly disregarded, namely, the importance of the role of the decision maker who decides about the acceptability of the generated rule base. Certainly, in ranking problems, there are more factors besides accuracy that play an important role. In this article, we turn to the field of multicriteria decision aid (MCDA) in order to cope with the aforementioned problems. We show that by a proper definition of the notion of partial dominance, it is possible to avoid the counter-intuitive outcomes of classification algorithms when applied to ranking problems. We focus on tree-based approaches and explain how the tree expansion can be guided by the principle of partial dominance preservation, and how the resulting rule base can be graphically represented and further refined. 2003 Wiley Periodicals, Inc. [source]

Diagnosis of breast cancer using diffuse reflectance spectroscopy: Comparison of a Monte Carlo versus partial least squares analysis based feature extraction technique

Changfang Zhu MS
Abstract Background and Objective We explored the use of diffuse reflectance spectroscopy in the ultraviolet-visible (UV-VIS) spectrum for the diagnosis of breast cancer. A physical model (Monte Carlo inverse model) and an empirical model (partial least squares analysis) based approach, were compared for extracting diagnostic features from the diffuse reflectance spectra. Study Design/Methods The physical model and the empirical model were employed to extract features from diffuse reflectance spectra measured from freshly excised breast tissues. A subset of extracted features obtained using each method showed statistically significant differences between malignant and non-malignant breast tissues. These features were separately input to a support vector machine (SVM) algorithm to classify each tissue sample as malignant or non-malignant. Results and Conclusions The features extracted from the Monte Carlo based analysis were hemoglobin saturation, total hemoglobin concentration, beta-carotene concentration and the mean (wavelength averaged) reduced scattering coefficient. Beta-carotene concentration was positively correlated and the mean reduced scattering coefficient was negatively correlated with percent adipose tissue content in normal breast tissues. In addition, there was a statistically significant decrease in the beta-carotene concentration and hemoglobin saturation, and a statistically significant increase in the mean reduced scattering coefficient in malignant tissues compared to non-malignant tissues. The features extracted from the partial least squares analysis were a set of principal components. A subset of principal components showed that the diffuse reflectance spectra of malignant breast tissues displayed an increased intensity over wavelength range of 440,510 nm and a decreased intensity over wavelength range of 510,600 nm, relative to that of non-malignant breast tissues. The diagnostic performance of the classification algorithms based on both feature extraction techniques yielded similar sensitivities and specificities of approximately 80% for discriminating between malignant and non-malignant breast tissues. While both methods yielded similar classification accuracies, the model based approach provided insight into the physiological and structural features that discriminate between malignant and non-malignant breast tissues. Lasers Surg. Med. 2006 Wiley-Liss, Inc. [source]