Scoring Matrices (scoring + matrix)

Distribution by Scientific Domains


Selected Abstracts


Evaluating low level sequence identities

FEBS JOURNAL, Issue 2 2001
AROM homologous?, Are Aspergillus QUTA
A review published several years ago [Hawkins, A.R. & Lamb, H.K. (1995) Eur. J. Biochem. 232, 7,18] proposed that genetic, biochemical and physiological data can override sequence comparison in the determination of homology in instances where structural information is unavailable. Their lead example was the hypothesis that the transcriptional activator protein for quinate catabolism in Aspergillus nidulans, QUTA, is derived from the pentafunctional AROM protein by a gene duplication followed by cleavage [Hawkins, A.R., Lamb, H.K., Moore, J.D. & Roberts, C.F. (1993) Gene136, 49,54]. We tested this hypothesis by a sensitive combination of position-specific log-odds scoring matrix methods. The position-specific log-odds scoring matrices were derived from a large number of 3-dehydroquinate synthase and 5- enolpyruvylshikimate-3-phosphate synthase domains that were proposed to be the domains from the AROM protein that gave rise to the transcriptional activator protein for quinate metabolism. We show that the degree and pattern of similarity between these position-specific log-odds scoring matrices and the transcriptional activator protein for quinate catabolism in A. nidulans is that expected for random sequences of the same composition. This level of similarity provides no support for the suggested gene duplication and cleavage. The lack of any trace of evidence for homology following a comprehensive sequence analysis indicates that the homology hypothesis is without foundation, underlining the necessity to accept only similarity of sequence and/or structure as evidence of evolutionary relatedness. Further, QUTA is homologous throughout its entire length to an extended family of fungal transcriptional regulatory proteins, rendering the hypothesized QUTA,AROM homology even more problematic. [source]


Comparison of two genotypic algorithms to determine HIV-1 tropism,

HIV MEDICINE, Issue 1 2008
C Soulié
Objectives One or both of two co-receptors, CCR5 (R5) and CXCR4 (X4), are used by HIV-1 to enter into host cells. The glycoprotein 120 (gp120) V3 sequence is correlated with the R5 and X4 phenotype. CCR5 inhibitors are specifically active against R5 viruses, suggesting the need to determine tropism before the use of these antagonists. A comparison of the position-specific scoring matrices (PSSM) and Geno2pheno algorithms based on the V3 loop gp120 sequences and previously described to be correlated to the R5 or X4 phenotype was carried out. Methods V3 envelope (env) genes from 83 plasma samples were amplified and sequenced, and 69 sequences were analysed with the PSSM and Geno2pheno algorithms. Results These two algorithms were concordant in 86.5% of cases. The Geno2pheno algorithm gave a tropism result more frequently than the PSSM algorithm, but R5X4 or X4 viruses were less frequently detected by the Geno2pheno algorithm. R5X4 or X4 tropism was predicted in 29.9% of samples. There was more R5X4 co-receptor use in the antiretroviral-treated group than in the antiretroviral-naïve group. Conclusions It is advisable to run a validated co-receptor use prediction tool before using co-receptor antagonists. If genotyping methods are considered, the PSSM and Geno2pheno algorithms are complementary and both are necessary. The association between predicted co-receptor use and virological response to co-receptor antagonists needs to be thoroughly evaluated. [source]


A versatile strategy to define the phosphorylation preferences of plant protein kinases and screen for putative substrates

THE PLANT JOURNAL, Issue 1 2008
Florina Vlad
Summary Most signaling networks are regulated by reversible protein phosphorylation. The specificity of this regulation depends in part on the capacity of protein kinases to recognize and efficiently phosphorylate particular sequence motifs in their substrates. Sequenced plant genomes potentially encode over than 1000 protein kinases, representing 4% of the proteins, twice the proportion found in humans. This plethora of plant kinases requires the development of high-throughput strategies to identify their substrates. In this study, we have implemented a semi-degenerate peptide array screen to define the phosphorylation preferences of four kinases from Arabidopsis thaliana that are representative of the plant calcium-dependent protein kinase and Snf1-related kinase superfamily. We converted these quantitative data into position-specific scoring matrices to identify putative substrates of these kinases in silico in protein sequence databases. Our data show that these kinases display related but nevertheless distinct phosphorylation motif preferences, suggesting that they might share common targets but are likely to have specific substrates. Our analysis also reveals that a conserved motif found in the stress-related dehydrin protein family may be targeted by the SnRK2-10 kinase. Our results indicate that semi-degenerate peptide array screening is a versatile strategy that can be used on numerous plant kinases to facilitate identification of their substrates, and therefore represents a valuable tool to decipher phosphorylation-regulated signaling networks in plants. [source]


GNBSL: A new integrative system to predict the subcellular location for Gram-negative bacteria proteins

PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 19 2006
Jian Guo
Abstract This paper proposes a new integrative system (GNBSL , Gram-negative bacteria subcellular localization) for subcellular localization specifized on the Gram-negative bacteria proteins. First, the system generates a position-specific frequency matrix (PSFM) and a position-specific scoring matrix (PSSM) for each protein sequence by searching the Swiss-Prot database. Then different features are extracted by four modules from the PSFM and the PSSM. The features include whole-sequence amino acid composition, N- and C-terminus amino acid composition, dipeptide composition, and segment composition. Four probabilistic neural network (PNN) classifiers are used to classify these modules. To further improve the performance, two modules trained by support vector machine (SVM) are added in this system. One module,extracts the residue-couple distribution from the amino acid sequence and the other module,applies a pairwise profile alignment kernel to measure the local similarity between every two sequences. Finally, an additional SVM is used to fuse the outputs from the six modules. Test on a benchmark dataset shows that the overall success rate of GNBSL is higher than those of PSORT-B, CELLO, and PSLpred. A web server GNBSL can be visited from http://166.111.24.5/webtools/GNBSL/index.htm. [source]