Bottom - Index of papers - Previous - Next - Abstract - CUBIC

Title: Protein Structure Prediction by Neural Networks
Author:Burkhard Rost
Quote: In: M. Arbib (ed.) "The Handbook of Brain Theory and Neural Networks" Cambridge,MA: Bradford Books/The MIT Press, 772-775 (1995)

Introduction for 'Protein Structure Prediction by Neural Networks'

What is a protein? The information for life is stored by a four-letter alphabet in the genes (DNA: deoxy-ribonucleic acid). Proteins are, among others, the macromolecules that perform all important tasks in organisms, such as catalysis of biochemical reactions, transport of nutrients, recognition and transmission of signals. Thus, genes are the blueprints and proteins the machinery of life. Proteins are formed by joining the amino acids into a long stretched chain, the protein sequence, a translation of the genes into a 20-letter alphabet of amino acids. Proteins differ in length (from 30 to 30,000 amino acids) and in the arrangement of amino acids (called residues, when joined in proteins). In water, the chain folds up to a unique three-dimensional (3D) structure. The main driving force is the need to pack residues for which a contact with water is energetically unfavourable (hydrophobic residues) into the interior of the molecule. A detailed analysis of the underlying chemistry shows that this is only possible if the protein forms regular patterns of a macroscopic substructure called secondary structure (Fig. 1; for an introduction in protein structure see Branden, 1991.

What determines protein function and structure? The 3D structure of a protein determines its function. But what determines the 3D structure? It is well established that the details of the 3D structure (also called the fold), are uniquely determined by the specificity of the sequence. Can the code be deciphered, i.e. can 3D structure be predicted from sequence? In principle, the code could by deciphered by calculating the physico-chemical force fields determining the fold. Unfortunately, the required computer time to calculate the 3D structure based on first principles is many orders of magnitude beyond today's possibilities. However, it is of practical importance to know the 3D structure. One reason is rational drug design.

Why not simply look by microscope at the 3D structure? Unfortunately, the techniques to experimentally determine 3D structure of a protein are rather complicated. Solving a structure can take from one to several years. Today for some 36,000 proteins the sequence is known, but only for 2,000 has the 3D structure been determined by experiment. Large gene sequencing projects increase the sequence-structure gap further. The most accurate way to predict 3D structure from the sequence is by homology modelling, i.e., search for a protein with similar sequence that has a known 3D structure and then model the 3D structure of the unknown protein in analogy to the known one. Such techniques lead to a reduction of the sequence-structure gap by some 9,000 proteins. However, there are still some 25,000 proteins for which researchers would like to know as much as possible about the structure, and this number is rapidly increasing.

Why can homology modelling be successful? The exchange of a few residues can already destabilise a protein. This implies that the majority of the 20N possible sequences of length N form different structures. But, has evolution created such an immense variety? Random errors in the DNA sequence lead to a different translation of protein sequences. These 'errors' are the basis of evolution. Mutations resulting in a structural change are not likely to be accepted, since the protein cannot perform its task. Furthermore, the universe of stable structures is not continuous, i.e., minor changes on the level of the 3D structure destabilise the structure. The evolutionary pressure to conserve function and the discontinuity of the universe of structures have the result that structure is evolutionarily more conserved than sequence. Evolution has produced pairs of proteins which have the same 3D structure with only 25% identical residues. Therefore, the 3D structure can be predicted rather accurately by homology if a protein with sufficient sequence identity and known 3D structure is found in the data bank.

Can the egg be unboiled? When an egg is boiled, the proteins it contains unfold. Can this procedure be reversed in theory? Or, in other words, can the encrypted code of protein folding be deciphered from the sequence? Current tools to predict the 3D structure from the sequence are rather limited (Rost and Sander, 1994b). Therefore, the problem has to be simplified. One extreme simplification is the prediction of one dimensional (1D) strings of secondary structure assignment. Others are the location of functionally important residues, the classification of proteins into structurally related families, or the prediction of whether or not a particular residue is buried in the core of the protein.

How can neural networks predict protein structure? In practice, the most successful predictions are based on an analysis of common features in the data bank of known 3D structures. Artificial neural networks are well suited for pattern classification. Here, we shall attempt to show how the application of neural networks as devices for pattern classification can be used for the prediction of protein structure. First, we give examples of how the data bank of known 3D structures can be used to predict secondary structure (1), and other structural features (2). Then, we briefly review attempts to predict entire 3D structures (3). Finally, we give a critical evaluation of the neural network applications by comparing these to alternative approaches and outline the prospects of applying neural networks for protein structure prediction.



Top - Index of papers - Previous - Next - Abstract - CUBIC