Similarity Measures (similarity + measure)

Distribution by Scientific Domains


Selected Abstracts


A NEW AGENT MATCHING SCHEME USING AN ORDERED FUZZY SIMILARITY MEASURE AND GAME THEORY

COMPUTATIONAL INTELLIGENCE, Issue 2 2008
Hamed Kebriaei
In this paper, an agent matching method for bilateral contracts in a multi-agent market is proposed. Each agent has a hierarchical representation of its trading commodity attributes by a tree structure of fuzzy attributes. Using this structure, the similarity between the trees of each pair of buyer and seller is computed using a new ordered fuzzy similarity algorithm. Then, using the concept of Stackelberg equilibrium in a leader,follower game, matchmaking is performed among the sellers and buyers. The fuzzy similarities of each agent with others in its personal viewpoint have been used as its payoffs in a bimatrix game. Through a case study for bilateral contracts of energy, the capabilities of the proposed agent-based system are illustrated. [source]


A Fragment-weighted Key-based Similarity Measure for Use in Structural Clustering and Virtual Screening

MOLECULAR INFORMATICS, Issue 3 2006
Marie Munk Jørgensen
Abstract A new similarity measure and structural clustering method has been developed in which each structure is fragmented into ring systems, linkers, and side chains, and where the similarity between structures forms a weighted sum of the similarities between each set of fragments. We have applied the MACCS keys as molecular descriptors and identified a number of ways in which to improve the use of these keys for structural clustering and virtual screening. The method has been optimized to reproduce scaffold-biased clustering commonly used in medicinal chemistry. [source]


Quantitative assessment of the effect of basis set superposition error on the electron density of molecular complexes by means of quantum molecular similarity measures

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 11 2009
Pedro Salvador
Abstract The Chemical Hamiltonian Approach (CHA) method is applied to obtain Basis Set Superposition Error (BSSE)-free molecular orbitals at the Hartree,Fock (HF) and Density Functional Theory (DFT) levels of theory. To assess qualitatively the effect of the BSSE on the first-order electron density, we had previously applied Bader's analysis of the intermolecular critical points located on the electron density, as well as density difference maps for several hydrogen bonded complexes. In this work, Quantum Molecular Similarity Measures are probed as an alternative avenue to properly quantify the electronic relaxation due to the BSSE removal by means of distance indices between the uncorrected and corrected charge densities. It is shown that BSSE contamination is more important at the DFT level of theory, and in some cases, changes on the topology of the electron density are observed upon BSSE correction. Inclusion of diffuse functions have been found to dramatically decrease the BSSE effect in both geometry and electron density. The CHA method represents a good compromise to obtain accurate results with small basis sets. © 2009 Wiley Periodicals, Inc. Int J Quantum Chem, 2009 [source]


Bond-based 3D-chiral linear indices: Theory and QSAR applications to central chirality codification

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 15 2008
Juan A. Castillo-Garit
Abstract The recently introduced non-stochastic and stochastic bond-based linear indices are been generalized to codify chemical structure information for chiral drugs, making use of a trigonometric 3D-chirality correction factor. These improved modified descriptors are applied to several well-known data sets to validate each one of them. Particularly, Cramer's steroid data set has become a benchmark for the assessment of novel quantitative structure activity relationship methods. This data set has been used by several researchers using 3D-QSAR approaches such as Comparative Molecular Field Analysis, Molecular Quantum Similarity Measures, Comparative Molecular Moment Analysis, E-state, Mapping Property Distributions of Molecular Surfaces, and so on. For that reason, it is selected by us for the sake of comparability. In addition, to evaluate the effectiveness of this novel approach in drug design we model the angiotensin-converting enzyme inhibitory activity of perindoprilate's ,-stereoisomers combinatorial library, as well as codify information related to a pharmacological property highly dependent on the molecular symmetry of a set of seven pairs of chiral N -alkylated 3-(3-hydroxyphenyl)-piperidines that bind ,-receptors. The validation of this method is achieved by comparison with earlier publications applied to the same data sets. The non-stochastic and stochastic bond-based 3D-chiral linear indices appear to provide a very interesting alternative to other more common 3D-QSAR descriptors. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2008 [source]


Good properties of similarity measures and their complementarity

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2010
Leo Egghe
Similarity measures, such as the ones of Jaccard, Dice, or Cosine, measure the similarity between two vectors. A good property for similarity measures would be that, if we add a constant vector to both vectors, then the similarity must increase. We show that Dice and Jaccard satisfy this property while Cosine and both overlap measures do not. Adding a constant vector is called, in Lorenz concentration theory, "nominal increase" and we show that the stronger "transfer principle" is not a required good property for similarity measures. Another good property is that, when we have two vectors and if we add one of these vectors to both vectors, then the similarity must increase. Now Dice, Jaccard, Cosine, and one of the overlap measures satisfy this property, while the other overlap measure does not. Also a variant of this latter property is studied. [source]


Similarity measures, author cocitation analysis, and information theory

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 7 2005
Loet Leydesdorff
The use of Pearson's correlation coefficient in Author Cocitation Analysis was compared with Salton's cosine measure in a number of recent contributions. Unlike the Pearson correlation, the cosine is insensitive to the number of zeros. However, one has the option of applying a logarithmic transformation in correlation analysis. Information calculus is based on both the logarithmic transformation and provides a non-parametric statistics. Using this methodology, one can cluster a document set in a precise way and express the differences in terms of bits of information. The algorithm is explained and used on the data set, which was made the subject of this discussion. [source]


Sparse points matching by combining 3D mesh saliency with statistical descriptors

COMPUTER GRAPHICS FORUM, Issue 2 2008
U. Castellani
Abstract This paper proposes new methodology for the detection and matching of salient points over several views of an object. The process is composed by three main phases. In the first step, detection is carried out by adopting a new perceptually-inspired 3D saliency measure. Such measure allows the detection of few sparse salient points that characterize distinctive portions of the surface. In the second step, a statistical learning approach is considered to describe salient points across different views. Each salient point is modelled by a Hidden Markov Model (HMM), which is trained in an unsupervised way by using contextual 3D neighborhood information, thus providing a robust and invariant point signature. Finally, in the third step, matching among points of different views is performed by evaluating a pairwise similarity measure among HMMs. An extensive and comparative experimental session has been carried out, considering real objects acquired by a 3D scanner from different points of view, where objects come from standard 3D databases. Results are promising, as the detection of salient points is reliable, and the matching is robust and accurate. [source]


Evolutionary coincidence-based ontology mapping extraction

EXPERT SYSTEMS, Issue 3 2008
Vahed Qazvinian
Abstract: Ontology matching is a process for selection of a good alignment across entities of two (or more) ontologies. This can be viewed as a two-phase process of (1) applying a similarity measure to find the correspondence of each pair of entities from two ontologies, and (2) extraction of an optimal or near optimal mapping. This paper is focused on the second phase and introduces our evolutionary approach for that. To be able to do so, we need a mechanism to score different possible mappings. Our solution is a weighting mechanism named coincidence-based weighting. A genetic algorithm is then introduced to create better mappings in successive iterations. We will explain how we code a mapping as well as our crossover and mutation functions. Evaluation of the algorithm is shown and discussed. [source]


Decision-making method using a visual approach for cluster analysis problems; indicative classification algorithms and grouping scope

EXPERT SYSTEMS, Issue 3 2007
Ran M. Bittmann
Abstract: Currently, classifying samples into a fixed number of clusters (i.e. supervised cluster analysis) as well as unsupervised cluster analysis are limited in their ability to support ,cross-algorithms' analysis. It is well known that each cluster analysis algorithm yields different results (i.e. a different classification); even running the same algorithm with two different similarity measures commonly yields different results. Researchers usually choose the preferred algorithm and similarity measure according to analysis objectives and data set features, but they have neither a formal method nor tool that supports comparisons and evaluations of the different classifications that result from the diverse algorithms. Current research development and prototype decisions support a methodology based upon formal quantitative measures and a visual approach, enabling presentation, comparison and evaluation of multiple classification suggestions resulting from diverse algorithms. This methodology and tool were used in two basic scenarios: (I) a classification problem in which a ,true result' is known, using the Fisher iris data set; (II) a classification problem in which there is no ,true result' to compare with. In this case, we used a small data set from a user profile study (a study that tries to relate users to a set of stereotypes based on sociological aspects and interests). In each scenario, ten diverse algorithms were executed. The suggested methodology and decision support system produced a cross-algorithms presentation; all ten resultant classifications are presented together in a ,Tetris-like' format. Each column represents a specific classification algorithm, each line represents a specific sample, and formal quantitative measures analyse the ,Tetris blocks', arranging them according to their best structures, i.e. best classification. [source]


Efficient video retrieval using index structure

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, Issue 2-3 2008
Jing Zhang
Abstract Video retrieval remains a challenging problem since most of traditional query algorithms are ineffectual and time-consuming. In this article, we proposed a new video retrieval method, which segments the video stream by visual similarity between neighboring frames, and adopt the high-dimensional index structure to organize segments. Furthermore, a new similarity measure is brought forward to improve the query accuracy by synthetically taking into account the visual similarity and temporal order among video segments. Based on the similarity measure, we propose a novel video clip retrieval algorithm which achieves high query efficiency by using restricted sliding window to construct candidate video clips. Experimental results show that the proposed video retrieval method is efficient and effective. © 2008 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 18, 113,123, 2008 [source]


Color invariant object recognition using entropic graphs

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, Issue 5 2006
Jan C. van Gemert
Abstract We present an object recognition approach using higher-order color invariant features with an entropy-based similarity measure. Entropic graphs offer an unparameterized alternative to common entropy estimation techniques, such as a histogram or assuming a probability distribution. An entropic graph estimates entropy from a spanning graph structure of sample data. We extract color invariant features from object images invariant to illumination changes in intensity, viewpoint, and shading. The Henze,Penrose similarity measure is used to estimate the similarity of two images. Our method is evaluated on the ALOI collection, a large collection of object images. This object image collection consists of 1000 objects recorded under various imaging circumstances. The proposed method is shown to be effective under a wide variety of imaging conditions. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 146,153, 2006 [source]


Relationships between entropy and similarity measure of interval-valued intuitionistic fuzzy sets

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 11 2010
Qiansheng Zhang
The concept of entropy of interval-valued intuitionistic fuzzy set (IvIFS) is first introduced. The close relationships between entropy and the similarity measure of interval-valued intuitionistic fuzzy sets are discussed in detail. We also obtain some important theorems by which entropy and similarity measure of IvIFSs can be transformed into each other based on their axiomatic definitions. Simultaneously, some formulae to calculate entropy and similarity measure of IvIFSs are put forward. © 2010 Wiley Periodicals, Inc. [source]


A flexible approach to evaluating soft conditions with unequal preferences in fuzzy databases

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 7 2007
Gloria Bordogna
A flexible model for evaluating soft query with unequal preferences in fuzzy databases is proposed. We assume that conditions with unequal preferences have an exclusive meaning like in the request "find a holiday accommodation such that big apartments are preferred to high rating hotels." In this case it is assumed that the aggregator of the soft conditions is an implicit OR. Conversely, conditions with unequal importance have an inclusive meaning, like in the query "find a house to rent that is cheap (most important), big (important), new (fairly important)." In this case the implicit aggregator is an AND. What we propose in this article is to model preferences as modifiers of the semantics of the evaluation function of the conditions. Because the soft conditions are aggregated by an OR, the more a soft condition is preferred, the more its evaluation function tolerates a greater undersatisfaction of the soft condition. The proposed approach is formalized by considering two alternative semantics of the evaluation function: the first semantics defines the evaluation function by means of a generalized fuzzy inclusion measure, and the second one as a generalized similarity measure. These functions are parameterized so that their modification is simply achieved by tuning the functions' parameters. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 665,689, 2007. [source]


From marine ecology to crime analysis: Improving the detection of serial sexual offences using a taxonomic similarity measure

JOURNAL OF INVESTIGATIVE PSYCHOLOGY AND OFFENDER PROFILING, Issue 1 2007
Jessica Woodhams
Abstract Jaccard has been the choice similarity metric in ecology and forensic psychology for comparison of sites or offences, by species or behaviour. This paper applies a more powerful hierarchical measure,taxonomic similarity (,s), recently developed in marine ecology,to the task of behaviourally linking serial crime. Forensic case linkage attempts to identify behaviourally similar offences committed by the same unknown perpetrator (called linked offences). ,s considers progressively higher-level taxa, such that two sites show some similarity even without shared species. We apply this index by analysing 55 specific offence behaviours classified hierarchically. The behaviours are taken from 16 sexual offences by seven juveniles where each offender committed two or more offences. We demonstrate that both Jaccard and ,s show linked offences to be significantly more similar than unlinked offences. With up to 20% of the specific behaviours removed in simulations, ,s is equally or more effective at distinguishing linked offences than where Jaccard uses a full data set. Moreover, ,s retains significant difference between linked and unlinked pairs, with up to 50% of the specific behaviours removed. As police decision-making often depends upon incomplete data, ,s has clear advantages and its application may extend to other crime types. Copyright © 2007 John Wiley & Sons, Ltd. [source]


On the complexity of Rocchio's similarity-based relevance feedback algorithm

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2007
Zhixiang Chen
Rocchio's similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive learning algorithm from examples in searching for documents represented by a linear classifier. Despite its popularity in various applications, there is little rigorous analysis of its learning complexity in literature. In this article, the authors prove for the first time that the learning complexity of Rocchio's algorithm is O(d + d2(log d + log n)) over the discretized vector space {0,,, n , 1}d, when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier over {0,,, n , 1}d can be improved to, at most, 1 + 2k (n , 1) (log d , log(n , 1)), where k is the number of nonzero components in q. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm. For example, the authors prove that Rocchio's algorithm has a lower bound on its learning complexity over the Boolean vector space {0, 1}d. [source]


A Fragment-weighted Key-based Similarity Measure for Use in Structural Clustering and Virtual Screening

MOLECULAR INFORMATICS, Issue 3 2006
Marie Munk Jørgensen
Abstract A new similarity measure and structural clustering method has been developed in which each structure is fragmented into ring systems, linkers, and side chains, and where the similarity between structures forms a weighted sum of the similarities between each set of fragments. We have applied the MACCS keys as molecular descriptors and identified a number of ways in which to improve the use of these keys for structural clustering and virtual screening. The method has been optimized to reproduce scaffold-biased clustering commonly used in medicinal chemistry. [source]


An Automatic Building Approach To Special Takagi-Sugeno Fuzzy Network For Unknown Plant Modeling And Stable Control

ASIAN JOURNAL OF CONTROL, Issue 2 2003
Chia-Feng Juang
ABSTRACT In previous studies, several stable controller design methods for plants represented by a special Takagi-Sugeno fuzzy network (STSFN) have been proposed. In these studies, the STSFN is, however, derived directly from the mathematical function of the controlled plant. For an unknown plant, there is a problem if STSFN cannot model the plant successfully. In order to address this problem, we have derived a learning algorithm for the construction of STSFN from input-output training data. Based upon the constructed STSFN, existing stable controller design methods can then be applied to an unknown plant. To verify this, stable fuzzy controller design by parallel distributed compensation (PDC) method is adopted. In PDC method, the precondition parts of the designed fuzzy controllers share the same fuzzy rule numbers and fuzzy sets as the STSFN. To reduce the controller rule number, the precondition part of the constructed STSFN is partitioned in a flexible way. Also, similarity measure together with merging operation between each neighboring fuzzy set are performed in each input dimension to eliminate the redundant fuzzy sets. The consequent parts in STSFN are designed by correlation measure to select only the significant input terms to participate in each rule's consequence and reduce the network parameters. Simulation results in the cart-pole balancing system have shown that with the proposed STSFN building approach, we are able to model the controlled plant with high accuracy and, in addition, can design a stable fuzzy controller with small parameter number. [source]


Microbial diversity of inflamed and noninflamed gut biopsy tissues in inflammatory bowel disease

INFLAMMATORY BOWEL DISEASES, Issue 6 2007
Shadi Sepehri MD
Abstract Background: Inflammatory bowel disease (IBD) is a chronic gastrointestinal condition without any known cause or cure. An imbalance in normal gut biota has been identified as an important factor in the inflammatory process. Methods: Fifty-eight biopsies from Crohn's disease (CD, n = 10), ulcerative colitis (UC, n = 15), and healthy controls (n = 16) were taken from a population-based case-control study. Automated ribosomal intergenic spacer analysis (ARISA) and terminal restriction fragment length polymorphisms (T-RFLP) were used as molecular tools to investigate the intestinal microbiota in these biopsies. Results: ARISA and T-RFLP data did not allow a high level of clustering based on disease designation. However, if clustering was done based on the inflammation criteria, the majority of biopsies grouped either into inflamed or noninflamed groups. We conducted statistical analyses using incidence-based species richness and diversity as well as the similarity measures. These indices suggested that the noninflamed tissues form an intermediate population between controls and inflamed tissue for both CD and UC. Of particular interest was that species richness increased from control to noninflamed tissue, and then declined in fully inflamed tissue. Conclusions: We hypothesize that there is a recruitment phase in which potentially pathogenic bacteria colonize tissue, and once the inflammation sets in, a decline in diversity occurs that may be a byproduct of the inflammatory process. Furthermore, we suspect that a better knowledge of the microbial species in the noninflamed tissue, thus before inflammation sets in, holds the clues to the microbial pathogenesis of IBD. (Inflamm Bowel Dis 2007) [source]


Dempster,Shafer models for object recognition and classification

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 3 2006
A.P. Dempster
We consider situations in which each individual member of a defined object set is characterized uniquely by a set of variables, and we propose models and associated methods that recognize or classify a newly observed individual. Inputs consist of uncertain observations on the new individual and on a memory bank of previously identified individuals. Outputs consist of uncertain inferences concerning degrees of agreement between the new object and previously identified objects or object classes, with inferences represented by Dempster,Shafer belief functions. We illustrate the approach using models constructed from independent simple support belief functions defined on binary variables. In the case of object recognition, our models lead to marginal belief functions concerning how well the new object matches objects in memory. In the classification model, we compute beliefs and plausibilities that the new object lies in defined subsets of an object set. When regarded as similarity measures, our belief and plausibility functions can be interpreted as candidate membership functions in the terminology of fuzzy logic. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 283,297, 2006. [source]


Quantitative assessment of the effect of basis set superposition error on the electron density of molecular complexes by means of quantum molecular similarity measures

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 11 2009
Pedro Salvador
Abstract The Chemical Hamiltonian Approach (CHA) method is applied to obtain Basis Set Superposition Error (BSSE)-free molecular orbitals at the Hartree,Fock (HF) and Density Functional Theory (DFT) levels of theory. To assess qualitatively the effect of the BSSE on the first-order electron density, we had previously applied Bader's analysis of the intermolecular critical points located on the electron density, as well as density difference maps for several hydrogen bonded complexes. In this work, Quantum Molecular Similarity Measures are probed as an alternative avenue to properly quantify the electronic relaxation due to the BSSE removal by means of distance indices between the uncorrected and corrected charge densities. It is shown that BSSE contamination is more important at the DFT level of theory, and in some cases, changes on the topology of the electron density are observed upon BSSE correction. Inclusion of diffuse functions have been found to dramatically decrease the BSSE effect in both geometry and electron density. The CHA method represents a good compromise to obtain accurate results with small basis sets. © 2009 Wiley Periodicals, Inc. Int J Quantum Chem, 2009 [source]


Foundation of quantum similarity measures and their relationship to QSPR: Density function structure, approximations, and application examples

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 1 2005
Ramon Carbó-Dorca
Abstract This work presents a schematic description of the theoretical foundations of quantum similarity measures and the varied usefulness of the enveloping mathematical structure. The study starts with the definition of tagged sets, continuing with inward matrix products, matrix signatures, and vector semispaces. From there, the construction and structure of quantum density functions become clear and facilitate entry into the description of quantum object sets, as well as into the construction of atomic shell approximations (ASA). An application of the ASA is presented, consisting of the density surfaces of a protein structure. Based on this previous background, quantum similarity measures are naturally constructed, and similarity matrices, composed of all the quantum similarity measures on a quantum object set, along with the quantum mechanical concept of expectation value of an operator, allow the setup of a fundamental quantitative structure,activity relationship (QSPR) equation based on quantum descriptors. An application example is presented based on the inhibition of photosynthesis produced by some naphthyridinone derivatives, which makes them good herbicide candidates. © 2004 Wiley Periodicals, Inc. Int J Quantum Chem, 2005 [source]


Automatic appearance-based loop detection from three-dimensional laser data using the normal distributions transform

JOURNAL OF FIELD ROBOTICS (FORMERLY JOURNAL OF ROBOTIC SYSTEMS), Issue 11-12 2009
Martin Magnusson
We propose a new approach to appearance-based loop detection for mobile robots, using three-dimensional (3D) laser scans. Loop detection is an important problem in the simultaneous localization and mapping (SLAM) domain, and, because it can be seen as the problem of recognizing previously visited places, it is an example of the data association problem. Without a flat-floor assumption, two-dimensional laser-based approaches are bound to fail in many cases. Two of the problems with 3D approaches that we address in this paper are how to handle the greatly increased amount of data and how to efficiently obtain invariance to 3D rotations. We present a compact representation of 3D point clouds that is still discriminative enough to detect loop closures without false positives (i.e., detecting loop closure where there is none). A low false-positive rate is very important because wrong data association could have disastrous consequences in a SLAM algorithm. Our approach uses only the appearance of 3D point clouds to detect loops and requires no pose information. We exploit the normal distributions transform surface representation to create feature histograms based on surface orientation and smoothness. The surface shape histograms compress the input data by two to three orders of magnitude. Because of the high compression rate, the histograms can be matched efficiently to compare the appearance of two scans. Rotation invariance is achieved by aligning scans with respect to dominant surface orientations. We also propose to use expectation maximization to fit a gamma mixture model to the output similarity measures in order to automatically determine the threshold that separates scans at loop closures from nonoverlapping ones. We discuss the problem of determining ground truth in the context of loop detection and the difficulties in comparing the results of the few available methods based on range information. Furthermore, we present quantitative performance evaluations using three real-world data sets, one of which is highly self-similar, showing that the proposed method achieves high recall rates (percentage of correctly identified loop closures) at low false-positive rates in environments with different characteristics. © 2009 Wiley Periodicals, Inc. [source]


Genetic diversity in pollen beetles (Meligethes aeneus) in Sweden: role of spatial, temporal and insecticide resistance factors

AGRICULTURAL AND FOREST ENTOMOLOGY, Issue 4 2007
Nadiya Kazachkova
Abstract 1,Pollen beetles Meligethes aeneus are pests of oilseed Brassica crops that are subject to intensive chemical control. Resistance to pyrethroids has been reported. Although this insect is of great economic importance, little is known about its genetic properties and population structure. 2,Amplified fragment length polymorphism (AFLP) analysis with the restriction endonuclease combination EcoRI and PstI was performed on 133 samples of groups of three pollen beetles collected during 2001,04 from five different provinces of Sweden. Both susceptible and resistant insects were studied. Using one primer combination, more than 450 polymorphic DNA fragments were obtained and, in total, four primer combinations were used for analysis. A subsample of 59 single beetles was analysed using one primer combination. 3,AFLP profiles were analysed by similarity measures using the Nei and Li coefficient and Neighbour-joining dendrograms were generated. The dendrogram built using 133 samples showed three distinct groups, each containing beetles representing one generation. Statistical analysis using analysis of molecular variance of single beetle samples showed no evidence of significant genetic difference between resistant and susceptible beetles. Instead, a clear difference between samples, depending on time of collection and generation, was observed. 4,The expected regional population structure, although statistically significant, explained little of the variation. The levels of genetic variation within populations were very high. There appears to be a high rate of gene flow between pollen beetle populations. The implications of this in the context of insecticide resistance are discussed. [source]


Good properties of similarity measures and their complementarity

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2010
Leo Egghe
Similarity measures, such as the ones of Jaccard, Dice, or Cosine, measure the similarity between two vectors. A good property for similarity measures would be that, if we add a constant vector to both vectors, then the similarity must increase. We show that Dice and Jaccard satisfy this property while Cosine and both overlap measures do not. Adding a constant vector is called, in Lorenz concentration theory, "nominal increase" and we show that the stronger "transfer principle" is not a required good property for similarity measures. Another good property is that, when we have two vectors and if we add one of these vectors to both vectors, then the similarity must increase. Now Dice, Jaccard, Cosine, and one of the overlap measures satisfy this property, while the other overlap measure does not. Also a variant of this latter property is studied. [source]


On the relation between the association strength and other similarity measures

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 7 2010
Leo Egghe
A graph in van Eck and Waltman [JASIST, 60(8), 2009, p. 1644], representing the relation between the association strength and the cosine, is partially explained as a sheaf of parabolas, each parabola being the functional relation between these similarity measures on the trajectories , a constant. Based on earlier obtained relations between cosine and other similarity measures (e.g., Jaccard index), we can prove new relations between the association strength and these other measures. [source]


New relations between similarity measures for vectors based on vector norms

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 2 2009
Leo Egghe
The well-known similarity measures Jaccard, Salton's cosine, Dice, and several related overlap measures for vectors are compared. While general relations are not possible to prove, we study these measures on the "trajectories" of the form , where a > 0 is a constant and ||·|| denotes the Euclidean norm of a vector. In this case, direct functional relations between these measures are proved. For Jaccard, we prove that it is a convexly increasing function of Salton's cosine measure, but always smaller than or equal to the latter, hereby explaining a curve, experimentally found by Leydesdorff. All the other measures have a linear relation with Salton's cosine, reducing even to equality, in case a = 1. Hence, for equally normed vectors (e.g., for normalized vectors) we, essentially, only have Jaccard's measure and Salton's cosine measure since all the other measures are equal to the latter. [source]


English-Arabic proper-noun transliteration-pairs creation

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2008
Mohamed Abdel Fattah
Proper nouns may be considered the most important query words in information retrieval. If the two languages use the same alphabet, the same proper nouns can be found in either language. However, if the two languages use different alphabets, the names must be transliterated. Short vowels are not usually marked on Arabic words in almost all Arabic documents (except very important documents like the Muslim and Christian holy books). Moreover, most Arabic words have a syllable consisting of a consonant-vowel combination (CV), which means that most Arabic words contain a short or long vowel between two successive consonant letters. That makes it difficult to create English-Arabic transliteration pairs, since some English letters may not be matched with any romanized Arabic letter. In the present study, we present different approaches for extraction of transliteration proper-noun pairs from parallel corpora based on different similarity measures between the English and romanized Arabic proper nouns under consideration. The strength of our new system is that it works well for low-frequency proper noun pairs. We evaluate the new approaches presented using two different English-Arabic parallel corpora. Most of our results outperform previously published results in terms of precision, recall, and F -Measure. [source]


Relationships between perceived features and similarity of images: A test of Tversky's contrast model

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2007
Abebe Rorissa
The rapid growth of the numbers of images and their users as a result of the reduction in cost and increase in efficiency of the creation, storage, manipulation, and transmission of images poses challenges to those who organize and provide access to images. One of these challenges is similarity matching, a key component of current content-based image retrieval systems. Similarity matching often is implemented through similarity measures based on geometric models of similarity whose metric axioms are not satisfied by human similarity judgment data. This study is significant in that it is among the first known to test Tversky's contrast model, which equates the degree of similarity of two stimuli to a linear combination of their common and distinctive features, in the context of image representation and retrieval. Data were collected from 150 participants who performed an image description and a similarity judgment task. Structural equation modeling, correlation, and regression analyses confirmed the relationships between perceived features and similarity of objects hypothesized by Tversky. The results hold implications for future research that will attempt to further test the contrast model and assist designers of image organization and retrieval systems by pointing toward alternative document representations and similarity measures that more closely match human similarity judgments. [source]


Measuring beta-diversity from taxonomic similarity

JOURNAL OF VEGETATION SCIENCE, Issue 6 2007
Giovanni Bacaro
Abstract Question: The utility of beta (,-) diversity measures that incorporate information about the degree of taxonomic (dis)similarity between species plots is becoming increasingly recognized. In this framework, the question for this study is: can we define an ecologically meaningful index of ,-diversity that, besides indicating simple species turnover, is able to account for taxonomic similarity amongst species in plots? Methods: First, the properties of existing measures of taxonomic similarity measures are briefly reviewed. Next, a new measure of plot-to-plot taxonomic similarity is presented that is based on the maximal common subgraph of two taxonomic trees. The proposed measure is computed from species presences and absences and include information about the degree of higher-level taxonomic similarity between species plots. The performance of the proposed measure with respect to existing coefficients of taxonomic similarity and the coefficient of Jaccard is discussed using a small data set of heath plant communities. Finally, a method to quantify ,-diversity from taxonomic dissimilarities is discussed. Results: The proposed measure of taxonomic ,-diversity incorporates not only species richness, but also information about the degree of higher-order taxonomic structure between species plots. In this view, it comes closer to a modern notion of biological diversity than more traditional measures of ,-di-versity. From regression analysis between the new coefficient and existing measures of taxonomic similarity it is shown that there is an evident nonlinearity between the coefficients. This nonlinearity demonstrates that the new coefficient measures similarity in a conceptually different way from previous indices. Also, in good agreement with the findings of previous authors, the regression between the new index and the Jaccard coefficient of similarity shows that more than 80% of the variance of the former is explained by the community structure at the species level, while only the residual variance is explained by differences in the higher-order taxonomic structure of the species plots. This means that a genuine taxonomic approach to the quantification of plot-to-plot similarity is only needed if we are interested in the residual system's variation that is related to the higher-order taxonomic structure of a pair of species plots. [source]


How well can the accuracy of comparative protein structure models be predicted?

PROTEIN SCIENCE, Issue 11 2008
David Eramian
Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the C, root-mean-squared deviation (RMSD) and native overlap (NO3.5Å) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model-specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5Å errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71. [source]