Classification Problems (classification + problem)

Distribution by Scientific Domains


Selected Abstracts


Decision-making method using a visual approach for cluster analysis problems; indicative classification algorithms and grouping scope

EXPERT SYSTEMS, Issue 3 2007
Ran M. Bittmann
Abstract: Currently, classifying samples into a fixed number of clusters (i.e. supervised cluster analysis) as well as unsupervised cluster analysis are limited in their ability to support ,cross-algorithms' analysis. It is well known that each cluster analysis algorithm yields different results (i.e. a different classification); even running the same algorithm with two different similarity measures commonly yields different results. Researchers usually choose the preferred algorithm and similarity measure according to analysis objectives and data set features, but they have neither a formal method nor tool that supports comparisons and evaluations of the different classifications that result from the diverse algorithms. Current research development and prototype decisions support a methodology based upon formal quantitative measures and a visual approach, enabling presentation, comparison and evaluation of multiple classification suggestions resulting from diverse algorithms. This methodology and tool were used in two basic scenarios: (I) a classification problem in which a ,true result' is known, using the Fisher iris data set; (II) a classification problem in which there is no ,true result' to compare with. In this case, we used a small data set from a user profile study (a study that tries to relate users to a set of stereotypes based on sociological aspects and interests). In each scenario, ten diverse algorithms were executed. The suggested methodology and decision support system produced a cross-algorithms presentation; all ten resultant classifications are presented together in a ,Tetris-like' format. Each column represents a specific classification algorithm, each line represents a specific sample, and formal quantitative measures analyse the ,Tetris blocks', arranging them according to their best structures, i.e. best classification. [source]


Using feedforward neural networks and forward selection of input variables for an ergonomics data classification problem

HUMAN FACTORS AND ERGONOMICS IN MANUFACTURING & SERVICE INDUSTRIES, Issue 1 2004
Chuen-Lung Chen
A method was developed to accurately predict the risk of injuries in industrial jobs based on datasets not meeting the assumptions of parametric statistical tools, or being incomplete. Previous research used a backward-elimination process for feedforward neural network (FNN) input variable selection. Simulated annealing (SA) was used as a local search method in conjunction with a conjugate-gradient algorithm to develop an FNN. This article presents an incremental step in the use of FNNs for ergonomics analyses, specifically the use of forward selection of input variables. Advantages to this approach include enhancing the effectiveness of the use of neural networks when observations are missing from ergonomics datasets, and preventing overspecification or overfitting of an FNN to training data. Classification performance across two methods involving the use of SA combined with either forward selection or backward elimination of input variables was comparable for complete datasets, and the forward-selection approach produced results superior to previously used methods of FNN development, including the error back-propagation algorithm, when dealing with incomplete data. © 2004 Wiley Periodicals, Inc. Hum Factors Man 14: 31,49, 2004. [source]


Predicting direction shifts on Canadian,US exchange rates with artificial neural networks

INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 2 2001
Jefferson T. Davis
The paper presents a variety of neural network models applied to Canadian,US exchange rate data. Networks such as backpropagation, modular, radial basis functions, linear vector quantization, fuzzy ARTMAP, and genetic reinforcement learning are examined. The purpose is to compare the performance of these networks for predicting direction (sign change) shifts in daily returns. For this classification problem, the neural nets proved superior to the naïve model, and most of the neural nets were slightly superior to the logistic model. Using multiple previous days' returns as inputs to train and test the backpropagation and logistic models resulted in no increased classification accuracy. The models were not able to detect a systematic affect of previous days' returns up to fifteen days prior to the prediction day that would increase model performance. Copyright © 2001 John Wiley & Sons, Ltd. [source]


Hidden Markov model-based real-time transient identifications in nuclear power plants

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 8 2002
Kee-Choon Kwon
In this article, a transient identification method based on a stochastic approach with the hidden Markov model (HMM) has been suggested and evaluated experimentally for the classification of nine types of transients in nuclear power plants (NPPs). A transient is defined as when a plant proceeds to an abnormal state from a normal state. Identification of the types of transients during an early accident stage in NPPs is crucial for proper action selection. The transient can be identified by its unique time-dependent patterns related to the principal variables. The HMM, a double-stochastic process, can be applied to transient identification that is a spatial and temporal classification problem under a statistical pattern-recognition framework. The trained HMM is created for each transient from a set of training data by the maximum-likelihood estimation method which uses a forward-backward algorithm and the Baum-Welch re-estimation algorithm. The transient identification is determined by calculating which model has the highest probability for given test data using the Viterbi algorithm. Several experimental tests have been performed with normalization methods, clustering algorithms, and a number of states in HMM. There are also a few experimental tests that have been performed, including superimposing random noise, adding systematic error, and adding untrained transients to verify its performance and robustness. The proposed real-time transient identification system has been proven to have many advantages, although there are still some problems that should be solved before applying it to an operating NPP. Further efforts are being made to improve the system performance and robustness in order to demonstrate reliability and accuracy to the required level. © 2002 Wiley Periodicals, Inc. [source]


A Two-Level Model for Evidence Evaluation in the Presence of Zeros,

JOURNAL OF FORENSIC SCIENCES, Issue 2 2010
Grzegorz Zadora Ph.D.
Abstract:, Likelihood ratios (LRs) provide a natural way of computing the value of evidence under competing propositions. We propose LR models for classification and comparison that extend the ideas of Aitken, Zadora, and Lucy and Aitken and Lucy to include consideration of zeros. Instead of substituting zeros by a small value, we view the presence of zeros as informative and model it using Bernoulli distributions. The proposed models are used for evaluation of forensic glass (comparison and classification problem) and paint data (comparison problem). Two hundred and sixty-four glass samples were analyzed by scanning electron microscopy, coupled with an energy dispersive X-ray spectrometer method and 36 acrylic topcoat paint samples by pyrolysis gas chromatography hyphened with mass spectrometer method. The proposed LR model gave very satisfactory results for the glass comparison problem and for most of the classification tasks for glass. Results of comparison of paints were also highly satisfactory, with only 3.0% false positive answers and 2.8% false negative answers. [source]


WELL LOG CALIBRATION OF KOHONEN-CLASSIFIED SEISMIC ATTRIBUTES USING BAYESIAN LOGIC

JOURNAL OF PETROLEUM GEOLOGY, Issue 4 2001
M. T. Taner
We present a new method for calibrating a classified 3D seismic volume. The classification process employs a Kohonen self-organizing map, a type of unsupervised artificial neural network; the subsequent calibration is performed using one or more suites of well logs. Kohonen self-organizing maps and other unsupervised clustering methods generate classes of data based on the identification of various discriminating features. These methods seek an organization in a dataset and form relational organized clusters. However, these clusters may or may not have any physical analogues in the real world. In order to relate them to the real world, we must develop a calibration method that not only defines the relationship between the clusters and real physical properties, but also provides an estimate of the validity of these relationships. With the development of this relationship, the whole dataset can then be calibrated. The clustering step reduces the multi-dimensional data into logically smaller groups. Each original data point defined by multiple attributes is reduced to a one- or two-dimensional relational group. This establishes some logical clustering and reduces the complexity of the classification problem. Furthermore, calibration should be more successful since it will have to consider less variability in the data. In this paper, we present a simple calibration method that employs Bayesian logic to provide the relationship between cluster centres and the real world. The output will give the most probable calibration between each self-organized map node and wellbore-measured parameters such as lithology, porosity and fluid saturation. The second part of the output comprises the calibration probability. The method is described in detail, and a case study is briefly presented using data acquired in the Orange River Basin, South Africa. The method shows promise as an alternative to current techniques for integrating seismic and log data during reservoir characterization. [source]


Inductive Inference: An Axiomatic Approach

ECONOMETRICA, Issue 1 2003
Itzhak Gilboa
A predictor is asked to rank eventualities according to their plausibility, based on past cases. We assume that she can form a ranking given any memory that consists of finitely many past cases. Mild consistency requirements on these rankings imply that they have a numerical representation via a matrix assigning numbers to eventuality,case pairs, as follows. Given a memory, each eventuality is ranked according to the sum of the numbers in its row, over cases in memory. The number attached to an eventuality,case pair can be interpreted as the degree of support that the past case lends to the plausibility of the eventuality. Special instances of this result may be viewed as axiomatizing kernel methods for estimation of densities and for classification problems. Interpreting the same result for rankings of theories or hypotheses, rather than of specific eventualities, it is shown that one may ascribe to the predictor subjective conditional probabilities of cases given theories, such that her rankings of theories agree with rankings by the likelihood functions. [source]


Assessment of four modifications of a novel indexing technique for case-based reasoning

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 4 2007
Mykola Galushka
In this article, we investigate four variations (D-HSM, D-HSW, D-HSE, and D-HSEW) of a novel indexing technique called D-HS designed for use in case-based reasoning (CBR) systems. All D-HS modifications are based on a matrix of cases indexed by their discretized attribute values. The main differences between them are in their attribute discretization stratagem and similarity determination metric. D-HSM uses a fixed number of intervals and simple intersection as a similarity metric; D-HSW uses the same discretization approach and a weighted intersection; D-HSE uses information gain to define the intervals and simple intersection as similarity metric; D-HSEW is a combination of D-HSE and D-HSW. Benefits of using D-HS include ease of case and similarity knowledge maintenance, simplicity, accuracy, and speed in comparison to conventional approaches widely used in CBR. We present results from the analysis of 20 case bases for classification problems and 15 case bases for regression problems. We demonstrate the improvements in accuracy and/or efficiency of each D-HS modification in comparison to traditional k -NN, R-tree, C4,5, and M5 techniques and show it to be a very attractive approach for indexing case bases. We also illuminate potential areas for further improvement of the D-HS approach. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 353,383, 2007. [source]


Incremental learning of collaborative classifier agents with new class acquisition: An incremental genetic algorithm approach

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 11 2003
Sheng-Uei Guan
A number of soft computing approaches such as neural networks, evolutionary algorithms, and fuzzy logic have been widely used for classifier agents to adaptively evolve solutions on classification problems. However, most work in the literature focuses on the learning ability of the individual classifier agent. This article explores incremental, collaborative learning in a multiagent environment. We use the genetic algorithm (GA) and incremental GA (IGA) as the main techniques to evolve the rule set for classification and apply new class acquisition as a typical example to illustrate the incremental, collaborative learning capability of classifier agents. Benchmark data sets are used to evaluate proposed approaches. The results show that GA and IGA can be used successfully for collaborative learning among classifier agents. © 2003 Wiley Periodicals, Inc. [source]


Rough approximation by dominance relations

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 2 2002
Salvatore Greco
In this article we are considering a multicriteria classification that differs from usual classification problems since it takes into account preference orders in the description of objects by condition and decision attributes. To deal with multicriteria classification we propose to use a dominance-based rough set approach (DRSA). This approach is different from the classic rough set approach (CRSA) because it takes into account preference orders in the domains of attributes and in the set of decision classes. Given a set of objects partitioned into pre-defined and preference-ordered classes, the new rough set approach is able to approximate this partition by means of dominance relations (instead of indiscernibility relations used in the CRSA). The rough approximation of this partition is a starting point for induction of if-then decision rules. The syntax of these rules is adapted to represent preference orders. The DRSA keeps the best properties of the CRSA: it analyses only facts present in data, and possible inconsistencies are not corrected. Moreover, the new approach does not need any prior discretization of continuous-valued attributes. In this article we characterize the DRSA as well as decision rules induced from these approximations. The usefulness of the DRSA and its advantages over the CRSA are presented in a real study of evaluation of the risk of business failure. © 2002 John Wiley & Sons, Inc. [source]


Powered partial least squares discriminant analysis,

JOURNAL OF CHEMOMETRICS, Issue 1 2009
Kristian Hovde Liland
Abstract From the fundamental parts of PLS-DA, Fisher's canonical discriminant analysis (FCDA) and Powered PLS (PPLS), we develop the concept of powered PLS for classification problems (PPLS-DA). By taking advantage of a sequence of data reducing linear transformations (consistent with the computation of ordinary PLS-DA components), PPLS-DA computes each component from the transformed data by maximization of a parameterized Rayleigh quotient associated with FCDA. Models found by the powered PLS methodology can contribute to reveal the relevance of particular predictors and often requires fewer and simpler components than their ordinary PLS counterparts. From the possibility of imposing restrictions on the powers available for optimization we obtain an explorative approach to predictive modeling not available to the traditional PLS methods. Copyright © 2008 John Wiley & Sons, Ltd. [source]


A variable selection strategy for supervised classification with continuous spectroscopic data

JOURNAL OF CHEMOMETRICS, Issue 2 2004
Ulf Indahl
Abstract In this paper we present a new variable selection method designed for classification problems where the X data are discretely sampled from continuous curves. For such data the loading weight vectors of a PLS discriminant analysis inherit the continuous behaviour, making the idea of local peaks meaningful. For successive components the local peaks are checked for importance before entering the set of selected variables. Our examples with NIR/NIT show that substantial simplification of the X space can be obtained without loss of classification power when compared with ,benchmark full-spectrum' methods. Copyright © 2004 John Wiley & Sons, Ltd. [source]


A Landmark Analysis-Based Approach to Age and Sex Classification of the Skull of the Mediterranean Monk Seal (Monachus monachus) (Hermann, 1779)

ANATOMIA, HISTOLOGIA, EMBRYOLOGIA, Issue 5 2009
C. Brombin
Summary This work aimed at applying geometric morphometric analysis techniques to the skull of the Mediterranean monk seal (Monachus monachus, Hermann, 1779). Inferential analyses were performed using a non-parameteric permutation framework based on a series of skulls of different age classes belonging to individuals of both sexes. Our goal was to establish whether a statistical approach based on osteometric measurements and surface analysis of photographs of the left lateral plane of the skull may lead to a different and scientifically sound method of age and sex classification in this critically endangered marine mammal. Our data indicate that non-parametric combination methodology enables the researcher to give local assessment using a combination with domains. Developing geometric morphometric techniques in a non-parametric permutation framework could be useful in solving high dimensional and small sample size problems as well as classification problems, including zoological classification of specimens within a specific population. The Mediterranean monk seal is believed to be the world's rarest pinniped and one of the most endangered mammals of the world, with fewer than 600 individuals currently surviving. The use of shape analysis would allow new insights into the biological characteristics of the monk seal by simply extracting potentially new information on age and size from museal specimens. [source]


Adaptive Weighted Learning for Unbalanced Multicategory Classification

BIOMETRICS, Issue 1 2009
Xingye Qiao
Summary In multicategory classification, standard techniques typically treat all classes equally. This treatment can be problematic when the dataset is unbalanced in the sense that certain classes have very small class proportions compared to others. The minority classes may be ignored or discounted during the classification process due to their small proportions. This can be a serious problem if those minority classes are important. In this article, we study the problem of unbalanced classification and propose new criteria to measure classification accuracy. Moreover, we propose three different weighted learning procedures, two one-step weighted procedures, as well as one adaptive weighted procedure. We demonstrate the advantages of the new procedures, using multicategory support vector machines, through simulated and real datasets. Our results indicate that the proposed methodology can handle unbalanced classification problems effectively. [source]