Protein Evolution

Friday 30th January 2009

Molecular Biotechnology Center [MBC], Torino, Italy

ELLS LearningLAB: Molecular evolution: modern evidence for Darwin's theory

Aidan Budd and Francesca Diella


Introduction: Protein Structure and Function

Primary Structure

Proteins are made up of one or more polypeptide chains (polymers of typically more than 50 amino acids), plus in some cases additional non-peptide co-factors e.g. the haem groups associated with hemoglobins. This PyMOL session file (which can be loaded directly into a locally-installed version of the PyMOL software, and then rotated, zoomed-in on etc.) shows a short section of a polypeptide chain (see image below for a snapshot) with "standard" colouring of the atoms (carbon in green, nitrogen in blue, oxygen in red etc.). The repeated structure of the polypeptide backbone is clear. The amino acid side chains (the region of the structures that differ between amino acids) are shown as lines. The two ends/termini of the peptide are clearly different - on the left side is an NH group (the "N-terminus"), while on the right side is a CO group (the "C-terminus").
3D structure of an extended-conformation peptide
Most amino acid residues in these polypeptides have as a side-chain taken from one of the 20 that are genetically specified - as shown in the table below

(taken from Wikimedia Commons). The order in which the different amino acids in a polypeptide are found, beginning from the N-terminus, is known as the "primary structure" of the peptide. For example, the peptide below

has the primary structure "tyrosine-isoleucine-serine-cystine-threonine" (or, in single-letter code "YISCT")

Primary Sequence From Structure - Quiz
To introduce you to some of the basics of using PyMOL, and of protein structure, look at the following image, and load this PyMOL file into PyMOL, to decide which of the following sequences accurately describes the primary structure of this peptide

  1. SDNVLIT
  2. SNDIVLT
  3. TLVIDNS
  4. TVLIENS


The 3D structure of the polypeptide backbone can vary greatly between different regions of the chain, due to rotation of amino acid residues with respect to each other - this file shows a different polypeptide which forms a rather different structure - much less extended than that of the previous peptide (note that as this file contains multiple structures inferred via NMR - if you view the structures one after the other quickly using the "play" button you will also gain an impression of the relative stability/flexibility of different regions of the chain.
Helical peptide
There are several ways of chacterising the physical/chemical properties of the twenty different amino acids e.g. there are 5 that can be found charged (KRHDE) and 15 non-charged that can't (ACFGILMNPQSTVW) - the diagram below gives some of the different characteristics used to classify the different side-chains (note that such classifications are often somewhat ambiguous - sometimes arginine (R) and lysine (K) are classified as hydrophobic (for example in the scheme below R is included in the hydrophobic category) as the region of the side-chain nearest to the backbone does have hydrophobic character.


Secondary Structure

Two particular types of backbone conformation are found in many different proteins - the "alpha helix" and the "beta sheet". Indeed, you have already seen examples of these two different types of structure - the "YISCT" peptide is a strand (i.e. part of a) beta sheet, while the much less extended peptide above is predominantly alpha-helical.

These two different types of "secondary structure" have different characteristic patterns of hydrogen bonds formed between backbone CO and NH groups.

Below you can see a region of a beta-sheet - the structure is relatively flat, the aligned backbone CO (red) and NH (blue) groups forming hydrogen bonds with each other. Here is the PyMOL file used to make this figure - you can see that while, from the view below, the structure looks fairly "flat", in reality the sheet region has a twist (this is typical for these kinds of structures). Note that the amino-acid side chains are not shown on these figures to make the pattern of bonding within the main chain easier to see.

Here, in contrast, are two views of the same alpha-helix - here the hydrogen bonds (in magenta) form along the axis of the helix - here is the PyMOL file of this structure


The image below shows an example of a protein that contains both alpha helices and beta sheets - a pyruvate decarboxylase from the yeast Saccharomyces cerevisiae (here's the corresponding PyMOL file). The helices are coloured in red, with their characteristic "spiral" representation, while the beta sheets are coloured in yellow, using their characteristic flattened-arrow representation. Other regions of the structure are shown in green.
 

Tertiary Structure

The 3D structure of a full-length polypeptide chain (known as its "tertiary" structure) can include many thousands of individual atoms - representations of the whole structure of the kind shown above are often provide too much information, such that it is not possible to identify particular details of interest. Therefore, a range of different methods of representation have been developed to provide simpler views of this kind of data. Taking the wild-type of one polypeptide of the human alpha-chain hemoglobin as an example, it is given in this file using the so-called "cartoon" representation
Human hemoglobin beta chain - cartoon view
and here with a view of the surface of the protein

A vital part of the function of a hemoglobin polypeptide chain is to bind the haem group so that it is able to interact appropriately with the oxygen molecules it transports. The surface view above shows what a precise, close fit the hemoglobin group makes with the peptide chain. An important aspect of understanding and interpreting the function of a given protein involves characteristing molecular interactions of this kind - for example, changes in the sequence of the hemoglobin polypeptide that prevent the haem group from binding in this way will presumably prevent the molecule (and hence the organism with that version of the gene) from functioning correctly.

Quaternary Structure

A protein may be made up of several polypeptide chains - for example, the biologically active form of human adult hemoglobin is made up of four polypeptide chains (two alpha and two beta chains) as can be seen in this PyMOL file and in the image below, where each of the four chains has been labeled with a different colour.

The 3D organisation of the multiple polypeptides (perhaps also with other co-factors) is known as the "quaternary structure" of the protein

Protein Sequence Alignments

Alignments of protein sequences are widely-used in analyses of protein structure (and evolution)

Sequences are aligned to each (i.e. gaps are inserted within the sequences) so that similar residues in the different sequences are placed in the same column. The image below shows a multiple sequence alignment (as it involves more than two sequences) - columns where most residues are the same from all sequences are coloured blue.

If you have the JalView sequence alignment editor software available, you can load the above alignment (or this FASTA format file into some other alignment editor) and then try inserting gaps in different positions to see how this affects the alignment - you should see that most gaps you introduce will cause one/several sequences to no longer have their residues aligned to similar residues in other sequences (making the new alignment you've created by inserting new gaps into the alignment new "worse" than the original one you loaded into the software - in the original alignment more residues were aligned so that similar residues were in the same column).

There are many different software packages available that aim to produce an as-good-as-possible alignment of a set of protein (or nucleotide) sequences e.g. here at the EBI

Note - this page (and others on this site) occasionally link to Wikipedia pages describing certain topics and concepts. Due to the community-annotation of these pages, they are liable to change, and may contain malicious or inadvertent errors

Back To Gibson Team Training Pages