Structural Alignment (structural + alignment)

Distribution by Scientific Domains


Selected Abstracts


Closed loop folding units from structural alignments: Experimental foldons revisited

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 15 2010
Sree V. Chintapalli
Abstract Nonoverlapping closed loops of around 25,35 amino acids formed via nonlocal interactions at the loop ends have been proposed as an important unit of protein structure. This hypothesis is significant as such short loops can fold quickly and so would not be bound by the Leventhal paradox, giving insight into the possible nature of the funnel in protein folding. Previously, these closed loops have been identified either by sequence analysis (conservation and autocorrelation) or studies of the geometry of individual proteins. Given the potential significance of the closed loop hypothesis, we have explored a new strategy for determining closed loops from the insertions identified by the structural alignment of proteins sharing the same overall fold. We determined the locations of the closed loops in 37 pairs of proteins and obtained excellent agreement with previously published closed loops. The relevance of NMR structures to closed loop determination is briefly discussed. For cytochrome c, cytochrome b562 and triosephophate isomerase, independent folding units have been determined on the basis of hydrogen exchange experiments and misincorporation proton-alkyl exchange experiments. The correspondence between these experimentally derived foldons and the theoretically derived closed loops indicates that the closed loop hypothesis may provide a useful framework for analyzing such experimental data. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010 [source]


Metaphor in the Mind: The Cognition of Metaphor1

PHILOSOPHY COMPASS (ELECTRONIC), Issue 2 2006
Elisabeth Camp
The most sustained and innovative recent work on metaphor has occurred in cognitive science and psychology. Psycholinguistic investigation suggests that novel, poetic metaphors are processed differently than literal speech, while relatively conventionalized and contextually salient metaphors are processed more like literal speech. This conflicts with the view of "cognitive linguists" like George Lakoff that all or nearly all thought is essentially metaphorical. There are currently four main cognitive models of metaphor comprehension: juxtaposition, category-transfer, feature-matching, and structural alignment. Structural alignment deals best with the widest range of examples; but it still fails to account for the complexity and richness of fairly novel, poetic metaphors. [source]


Automatic generation and evaluation of sparse protein signatures for families of protein structural domains

PROTEIN SCIENCE, Issue 1 2005
Matthew J. Blades
Abstract We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%,30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling. [source]


In search for more accurate alignments in the twilight zone

PROTEIN SCIENCE, Issue 7 2002
Lukasz Jaroszewski
Abstract A major bottleneck in comparative modeling is the alignment quality; this is especially true for proteins whose distant relationships could be reliably recognized only by recent advances in fold recognition. The best algorithms excel in recognizing distant homologs but often produce incorrect alignments for over 50% of protein pairs in large fold-prediction benchmarks. The alignments obtained by sequence,sequence or sequence,structure matching algorithms differ significantly from the structural alignments. To study this problem, we developed a simplified method to explicitly enumerate all possible alignments for a pair of proteins. This allowed us to estimate the number of significantly different alignments for a given scoring method that score better than the structural alignment. Using several examples of distantly related proteins, we show that for standard sequence,sequence alignment methods, the number of significantly different alignments is usually large, often about 1010 alternatives. This distance decreases when the alignment method is improved, but the number is still too large for the brute force enumeration approach. More effective strategies were needed, so we evaluated and compared two well-known approaches for searching the space of suboptimal alignments. We combined their best features and produced a hybrid method, which yielded alignments that surpassed the original alignments for about 50% of protein pairs with minimal computational effort. [source]


Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments

PROTEIN SCIENCE, Issue 11 2000
Iddo Friedberg
Abstract The PSI-BLAST algorithm has been acknowledged as one of the most powerful tools for detecting remote evolutionary relationships by sequence considerations only. This has been demonstrated by its ability to recognize remote structural homologues and by the greatest coverage it enables in annotation of a complete genome. Although recognizing the correct fold of a sequence is of major importance, the accuracy of the alignment is crucial for the success of modeling one sequence by the structure of its remote homologue. Here we assess the accuracy of PSI-BLAST alignments on a stringent database of 123 structurally similar, sequence-dissimilar pairs of proteins, by comparing them to the alignments defined on a structural basis. Each protein sequence is compared to a nonredundant database of the protein sequences by PSI-BLAST. Whenever a pair member detects its pair-mate, the positions that are aligned both in the sequential and structural alignments are determined, and the alignment sensitivity is expressed as the per-centage of these positions out of the structural alignment. Fifty-two sequences detected their pair-mates (for 16 pairs the success was bi-directional when either pair member was used as a query). The average percentage of correctly aligned residues per structural alignment was 43.5 ± 2.2%. Other properties of the alignments were also examined, such as the sensitivity vs. specificity and the change in these parameters over consecutive iterations. Notably, there is an improvement in alignment sensitivity over consecutive iterations, reaching an average of 50.9 + 2.5% within the five iterations tested in the current study. [source]


Natural Diversity to Guide Focused Directed Evolution

CHEMBIOCHEM, Issue 13 2010
Helge Jochens Dr.
Abstract Simultaneous multiple site-saturation mutagenesis was performed at four active-site positions of an esterase from Pseudomonas fluorescens to improve its ability to convert 3-phenylbutyric acid esters (3-PBA) in an enantioselective manner. Based on an appropriate codon choice derived from a structural alignment of 1751 sequences of ,/,-hydrolase fold enzymes, only those amino acids were considered for library creation that appeared frequently in structurally equivalent positions. Thus, the number of mutants to be screened could be substantially reduced while the number of functionally intact variants was increased. Whereas the wild-type esterase showed only marginal activity and poor enantioselectivity (Etrue=3.2) towards 3-PBA-ethyl ester, a significant number of hits with improved rates (up to 240-fold) and enantioselectivities (up to Etrue=80) were identified in these "smart" libraries. [source]


Closed loop folding units from structural alignments: Experimental foldons revisited

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 15 2010
Sree V. Chintapalli
Abstract Nonoverlapping closed loops of around 25,35 amino acids formed via nonlocal interactions at the loop ends have been proposed as an important unit of protein structure. This hypothesis is significant as such short loops can fold quickly and so would not be bound by the Leventhal paradox, giving insight into the possible nature of the funnel in protein folding. Previously, these closed loops have been identified either by sequence analysis (conservation and autocorrelation) or studies of the geometry of individual proteins. Given the potential significance of the closed loop hypothesis, we have explored a new strategy for determining closed loops from the insertions identified by the structural alignment of proteins sharing the same overall fold. We determined the locations of the closed loops in 37 pairs of proteins and obtained excellent agreement with previously published closed loops. The relevance of NMR structures to closed loop determination is briefly discussed. For cytochrome c, cytochrome b562 and triosephophate isomerase, independent folding units have been determined on the basis of hydrogen exchange experiments and misincorporation proton-alkyl exchange experiments. The correspondence between these experimentally derived foldons and the theoretically derived closed loops indicates that the closed loop hypothesis may provide a useful framework for analyzing such experimental data. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010 [source]


Assessing strategies for improved superfamily recognition

PROTEIN SCIENCE, Issue 7 2005
Ian Sillitoe
Abstract There are more than 200 completed genomes and over 1 million nonredundant sequences in public repositories. Although the structural data are more sparse (,13,000 nonredundant structures solved to date), several powerful sequence-based methodologies now allow these structures to be mapped onto related regions in a significant proportion of genome sequences. We review a number of publicly available strategies for providing structural annotations for genome sequences, and we describe the protocol adopted to provide CATH structural annotations for completed genomes. In particular, we assess the performance of several sequence-based protocols employing Hidden Markov model (HMM) technologies for superfamily recognition, including a new approach (SAMOSA [sequence augmented models of structure alignments]) that exploits multiple structural alignments from the CATH domain structure database when building the models. Using a data set of remote homologs detected by structure comparison and manually validated in CATH, a single-seed HMM library was able to recognize 76% of the data set. Including the SAMOSA models in the HMM library showed little gain in homolog recognition, although a slight improvement in alignment quality was observed for very remote homologs. However, using an expanded 1D-HMM library, CATH-ISL increased the coverage to 86%. The single-seed HMM library has been used to annotate the protein sequences of 120 genomes from all three major kingdoms, allowing up to 70% of the genes or partial genes to be assigned to CATH superfamilies. It has also been used to recruit sequences from Swiss-Prot and TrEMBL into CATH domain superfamilies, expanding the CATH database eightfold. [source]


Automatic generation and evaluation of sparse protein signatures for families of protein structural domains

PROTEIN SCIENCE, Issue 1 2005
Matthew J. Blades
Abstract We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%,30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling. [source]


In search for more accurate alignments in the twilight zone

PROTEIN SCIENCE, Issue 7 2002
Lukasz Jaroszewski
Abstract A major bottleneck in comparative modeling is the alignment quality; this is especially true for proteins whose distant relationships could be reliably recognized only by recent advances in fold recognition. The best algorithms excel in recognizing distant homologs but often produce incorrect alignments for over 50% of protein pairs in large fold-prediction benchmarks. The alignments obtained by sequence,sequence or sequence,structure matching algorithms differ significantly from the structural alignments. To study this problem, we developed a simplified method to explicitly enumerate all possible alignments for a pair of proteins. This allowed us to estimate the number of significantly different alignments for a given scoring method that score better than the structural alignment. Using several examples of distantly related proteins, we show that for standard sequence,sequence alignment methods, the number of significantly different alignments is usually large, often about 1010 alternatives. This distance decreases when the alignment method is improved, but the number is still too large for the brute force enumeration approach. More effective strategies were needed, so we evaluated and compared two well-known approaches for searching the space of suboptimal alignments. We combined their best features and produced a hybrid method, which yielded alignments that surpassed the original alignments for about 50% of protein pairs with minimal computational effort. [source]


Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments

PROTEIN SCIENCE, Issue 11 2000
Iddo Friedberg
Abstract The PSI-BLAST algorithm has been acknowledged as one of the most powerful tools for detecting remote evolutionary relationships by sequence considerations only. This has been demonstrated by its ability to recognize remote structural homologues and by the greatest coverage it enables in annotation of a complete genome. Although recognizing the correct fold of a sequence is of major importance, the accuracy of the alignment is crucial for the success of modeling one sequence by the structure of its remote homologue. Here we assess the accuracy of PSI-BLAST alignments on a stringent database of 123 structurally similar, sequence-dissimilar pairs of proteins, by comparing them to the alignments defined on a structural basis. Each protein sequence is compared to a nonredundant database of the protein sequences by PSI-BLAST. Whenever a pair member detects its pair-mate, the positions that are aligned both in the sequential and structural alignments are determined, and the alignment sensitivity is expressed as the per-centage of these positions out of the structural alignment. Fifty-two sequences detected their pair-mates (for 16 pairs the success was bi-directional when either pair member was used as a query). The average percentage of correctly aligned residues per structural alignment was 43.5 ± 2.2%. Other properties of the alignments were also examined, such as the sensitivity vs. specificity and the change in these parameters over consecutive iterations. Notably, there is an improvement in alignment sensitivity over consecutive iterations, reaching an average of 50.9 + 2.5% within the five iterations tested in the current study. [source]