Home About us Contact | |||
Similarity Metric (similarity + metric)
Selected AbstractsAssessment of four modifications of a novel indexing technique for case-based reasoningINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 4 2007Mykola Galushka In this article, we investigate four variations (D-HSM, D-HSW, D-HSE, and D-HSEW) of a novel indexing technique called D-HS designed for use in case-based reasoning (CBR) systems. All D-HS modifications are based on a matrix of cases indexed by their discretized attribute values. The main differences between them are in their attribute discretization stratagem and similarity determination metric. D-HSM uses a fixed number of intervals and simple intersection as a similarity metric; D-HSW uses the same discretization approach and a weighted intersection; D-HSE uses information gain to define the intervals and simple intersection as similarity metric; D-HSEW is a combination of D-HSE and D-HSW. Benefits of using D-HS include ease of case and similarity knowledge maintenance, simplicity, accuracy, and speed in comparison to conventional approaches widely used in CBR. We present results from the analysis of 20 case bases for classification problems and 15 case bases for regression problems. We demonstrate the improvements in accuracy and/or efficiency of each D-HS modification in comparison to traditional k -NN, R-tree, C4,5, and M5 techniques and show it to be a very attractive approach for indexing case bases. We also illuminate potential areas for further improvement of the D-HS approach. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 353,383, 2007. [source] From marine ecology to crime analysis: Improving the detection of serial sexual offences using a taxonomic similarity measureJOURNAL OF INVESTIGATIVE PSYCHOLOGY AND OFFENDER PROFILING, Issue 1 2007Jessica Woodhams Abstract Jaccard has been the choice similarity metric in ecology and forensic psychology for comparison of sites or offences, by species or behaviour. This paper applies a more powerful hierarchical measure,taxonomic similarity (,s), recently developed in marine ecology,to the task of behaviourally linking serial crime. Forensic case linkage attempts to identify behaviourally similar offences committed by the same unknown perpetrator (called linked offences). ,s considers progressively higher-level taxa, such that two sites show some similarity even without shared species. We apply this index by analysing 55 specific offence behaviours classified hierarchically. The behaviours are taken from 16 sexual offences by seven juveniles where each offender committed two or more offences. We demonstrate that both Jaccard and ,s show linked offences to be significantly more similar than unlinked offences. With up to 20% of the specific behaviours removed in simulations, ,s is equally or more effective at distinguishing linked offences than where Jaccard uses a full data set. Moreover, ,s retains significant difference between linked and unlinked pairs, with up to 50% of the specific behaviours removed. As police decision-making often depends upon incomplete data, ,s has clear advantages and its application may extend to other crime types. Copyright © 2007 John Wiley & Sons, Ltd. [source] RelACCS-FP: A Structural Minimalist Approach to Fingerprint DesignCHEMICAL BIOLOGY & DRUG DESIGN, Issue 5 2008Ye Hu The design and evaluation of structural key-type fingerprints is reported that consist of only 10,30 substructures isolated from randomly generated fragment populations of different classes of active compounds. To identify minimal sets of fragments that carry substantial compound class-specific information, fragment frequency calculations are applied to guide fingerprint generation. These compound class-directed and extremely small structural fingerprints push the design of so-called mini-fingerprints to the limit and are the shortest bit string fingerprints reported to date. For the application of relative frequency-based activity class characteristic substructure fingerprints, a bit density-dependent similarity metric is introduced that makes it possible to adjust similarity coefficients for individual compound classes and balance the recall of active compounds with database selection size. In similarity search trials, these small compound class-directed fingerprints enrich active compounds in relatively small database selection sets and approach or exceed the performance of widely used structural fingerprints of much larger size and higher complexity. [source] Similarity-Based Virtual Screening with a Bayesian Inference NetworkCHEMMEDCHEM, Issue 2 2009Ammar Abdo Abstract An inference network model for molecular similarity searching: The similarity search problem is modeled using inference or evidential reasoning under uncertainty. The inference network model treats similarity searching as an evidential reasoning process in which multiple sources of evidence about compounds and reference structures are combined to estimate resemblance probabilities. Many methods have been developed to capture the biological similarity between two compounds for use in drug discovery. A variety of similarity metrics have been introduced, the Tanimoto coefficient being the most prominent. Many of the approaches assume that molecular features or descriptors that do not relate to the biological activity carry the same weight as the important aspects in terms of biological similarity. Herein, a novel similarity searching approach using a Bayesian inference network is discussed. Similarity searching is regarded as an inference or evidential reasoning process in which the probability that a given compound has biological similarity with the query is estimated and used as evidence. Our experiments demonstrate that the similarity approach based on Bayesian inference networks is likely to outperform the Tanimoto similarity search and offer a promising alternative to existing similarity search approaches. [source] The Human Phenotype OntologyCLINICAL GENETICS, Issue 6 2010PN Robinson Robinson PN, Mundlos S. The Human Phenotype Ontology. A standardized, controlled vocabulary allows phenotypic information to be described in an unambiguous fashion in medical publications and databases. The Human Phenotype Ontology (HPO) is being developed in an effort to provide such a vocabulary. The use of an ontology to capture phenotypic information allows the use of computational algorithms that exploit semantic similarity between related phenotypic abnormalities to define phenotypic similarity metrics, which can be used to perform database searches for clinical diagnostics or as a basis for incorporating the human phenome into large-scale computational analysis of gene expression patterns and other cellular phenomena associated with human disease. The HPO is freely available at http://www.human-phenotype-ontology.org. [source] |