Bottom - Index of papers - Previous - Next - Paper in HTML - Abstract - CUBIC

Title: Pitfalls of protein sequence analysis
Author:Burkhard Rost & Alfonso Valencia
Quote: Curr. Opin. Biotech., 7, 457-461 (1996)

Introduction to 'Pitfalls of protein sequence analysis'

Imagine you have a protein sequence, either sequenced in your own lab or pulled down from genome projects of EST production. You decide to let theoretical biology assist you in finding a priori information about your protein that may be useful to accelerate and design experiments. You submit your sequence to database search and/or structure prediction services. The possible pitfalls are numerous, including picking a lousy server or misinterpreting the results. We give examples for common pitfalls collected after 80,000 requests to an automatic prediction service (Table).

What can theory predict of protein structure? In general, protein three-dimensional (3D) structure can NOT be predicted from sequence. However, 3D structure can be predicted by homology modelling, i.e., by using a sequence homologue (>25% sequence identity) with an experimentally determined 3D structures. If no sequence homologue is found in PDB, there still is a chance to predict 3D structure by threading, i.e., by remote homology modelling (<25% sequence identity). However, correct 3D models -and even correct detection of remote homology - from threading are rare But, theory can assist by predicting one-dimensional (1D) aspects of 3D structure, e.g., secondary structure, solvent accessibility, transmembrane helices, binding sites, sequence motifs, and aspects of protein function.

Ease of use bears an ease of misuse. Rapidly developing electronic communication (Internet, World Wide Web) facilitates spreading prediction methods. Experimental biologists submit sequences, theoretical biologists configure automatic services that return predictions. The advantage is that users need not become experts for sequence analysis tools. However, the ease of offering and accessing predictions bears two problems. (1) Inaccurate methods (or insufficiently validated ones) are made available bypassing selection systems such as referees. (2) Users may misinterpret results due to a lack of insight into the features of prediction methods.



Top - Index of papers - Previous - Next - Paper in HTML - Abstract - Paper as PDF - Appendix - CUBIC