Similarity Score (similarity + score)

Distribution by Scientific Domains


Selected Abstracts


A Method for Evaluating Outcomes of Restoration When No Reference Sites Exist

RESTORATION ECOLOGY, Issue 1 2009
J. Stephen Brewer
Abstract Ecological restoration typically seeks to shift species composition toward that of existing reference sites. Yet, comparing the assemblages in restored and reference habitats assumes that similarity to the reference habitat is the optimal outcome of restoration and does not provide a perspective on regionally rare off-site species. When no such reference assemblages of species exist, an accurate assessment of the habitat affinities of species is crucial. We present a method for using a species by habitat data matrix generated by biodiversity surveys to evaluate community responses to habitat restoration treatments. Habitats within the region are rated on their community similarity to a hypothetical restored habitat, other habitats of conservation concern, and disturbed habitats. Similarity scores are reinserted into the species by habitat matrix to produce indicator (I) scores for each species in relation to these habitats. We apply this procedure to an open woodland restoration project in north Mississippi (U.S.A.) by evaluating initial plant community responses to restoration. Results showed a substantial increase in open woodland indicators, a modest decrease in generalists historically restricted to floodplain forests, and no significant change in disturbance indicators as a group. These responses can be interpreted as a desirable outcome, regardless of whether species composition approaches that of reference sites. The broader value of this approach is that it provides a flexible and objective means of predicting and evaluating the outcome of restoration projects involving any group of species in any region, provided there is a biodiversity database that includes habitat and location information. [source]


Two novel Mesocestoides vogae fatty acid binding proteins , functional and evolutionary implications

FEBS JOURNAL, Issue 1 2008
Gabriela Alvite
This work describes two new fatty acid binding proteins (FABPs) identified in the parasite platyhelminth Mesocestoides vogae (syn. corti). The corresponding polypeptide chains share 62% identical residues and overall 90% similarity according to clustalx default conditions. Compared with Cestoda FABPs, these proteins share the highest similarity score with the Taenia solium protein. M. vogae FABPs are also phylogenetically related to the FABP3/FABP4 mammalian FABP subfamilies. The native proteins were purified by chromatographical procedures, and apparent molecular mass and isoelectric point were determined. Immunolocalization studies determined the localization of the expression of these proteins in the larval form of the parasite. The genomic exon,intron organization of both genes is also reported, and supports new insights on intron evolution. Consensus motifs involved in splicing were identified. [source]


A novel method for enzyme design

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 2 2009
Xiaolei Zhu
Abstract Rational design of enzymes is a stringent test of our understanding of protein structure and function relationship, which also has numerous potential applications. We present a novel method for enzyme design that can find good candidate protein scaffolds in a protein-ligand database based on vector matching of key residues. Residues in the vicinity of the active site were also compared according to a similarity score between the scaffold protein and the target enzyme. Suitable scaffold proteins were selected, and the side chains of residues around the active sites were rebuilt using a previously developed side-chain packing program. Triose phosphate isomerase (TIM) was used as a validation test for enzyme design. Selected scaffold proteins were found to accommodate the enzyme active sites and successfully form a good transition state complex. This method overcomes the limitations of the current enzyme design methods that use limited number of protein scaffold and based on the position of ligands. As there are a large number of protein scaffolds available in the Protein Data Band, this method should be widely applicable for various types of enzyme design. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2009 [source]


MassBank: a public repository for sharing mass spectral data for life sciences

JOURNAL OF MASS SPECTROMETRY (INCORP BIOLOGICAL MASS SPECTROMETRY), Issue 7 2010
Hisayuki Horai
Abstract MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry(EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MSn data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10 286 volatile natural and synthetic compounds, and 3045 ESI-MS2 data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS2 data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS2 data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS2 data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21,23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data. Copyright © 2010 John Wiley & Sons, Ltd. [source]


Passage detection using text classification

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 4 2009
Saket Mengle
Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy. [source]


Database searching by flexible protein structure alignment

PROTEIN SCIENCE, Issue 7 2004
Yuzhen Ye
Abstract We have recently developed a flexible protein structure alignment program (FATCAT) that identifies structural similarity, at the same time accounting for flexibility of protein structures. One of the most important applications of a structure alignment method is to aid in functional annotations by identifying similar structures in large structural databases. However, none of the flexible structure alignment methods were applied in this task because of a lack of significance estimation of flexible alignments. In this paper, we developed an estimate of the statistical significance of FATCAT alignment score, allowing us to use it as a database-searching tool. The results reported here show that (1) the distribution of the similarity score of FATCAT alignment between two unrelated protein structures follows the extreme value distribution (EVD), adding one more example to the current collection of EVDs of sequence and structure similarities; (2) introducing flexibility into structure comparison only slightly influences the sensitivity and specificity of identifying similar structures; and (3) the overall performance of FATCAT as a database searching tool is comparable to that of the widely used rigid-body structure comparison programs DALI and CE. Two examples illustrating the advantages of using flexible structure alignments in database searching are also presented. The conformational flexibilities that were detected in the first example may be involved with substrate specificity, and the conformational flexibilities detected in the second example may reflect the evolution of structures by block building. [source]


A Regression-based Association Test for Case-control Studies that Uses Inferred Ancestral Haplotype Similarity

ANNALS OF HUMAN GENETICS, Issue 5 2009
Youfang Liu
Summary Association methods based on haplotype similarity (HS) can overcome power and stability issues encountered in standard haplotype analyses. Current HS methods can be generally classified into evolutionary and two-sample approaches. We propose a new regression-based HS association method for case-control studies that incorporates covariate information and combines the advantages of the two classes of approaches by using inferred ancestral haplotypes. We first estimate the ancestral haplotypes of case individuals and then, for each individual, an ancestral-haplotype-based similarity score is computed by comparing that individual's observed genotype with the estimated ancestral haplotypes. Trait values are then regressed on the similarity scores. Covariates can easily be incorporated into this regression framework. To account for the bias in the raw p-values due to the use of case data in constructing ancestral haplotypes, as well as to account for variation in ancestral haplotype estimation, a permutation procedure is adopted to obtain empirical p-values. Compared with the standard haplotype score test and the multilocus T2 test, our method improves power when neither the allele frequency nor linkage disequilibrium between the disease locus and its neighboring SNPs is too low and is comparable in other scenarios. We applied our method to the Genetic Analysis Workshop 15 simulated SNP data and successfully pinpointed a stretch of SNPs that covers the fine-scale region where the causal locus is located. [source]