Protein Evolution
Friday 30th January 2009
Aidan
Budd and Francesca
Diella
Introduction: Protein Structure and Function
Primary Structure
Proteins are made up of one or more polypeptide chains (polymers of
typically more than 50 amino acids), plus in some cases additional
non-peptide
co-factors e.g. the haem groups associated with hemoglobins. This
PyMOL session file
(which can be loaded directly into a locally-installed version of the PyMOL software, and then
rotated, zoomed-in on etc.) shows a short section of a polypeptide
chain (see image below for a snapshot) with "standard" colouring of the
atoms (carbon in green, nitrogen in blue, oxygen in red etc.). The
repeated structure of the polypeptide backbone is clear. The amino acid
side chains (the region of the structures that differ between amino
acids) are shown as lines. The two ends/termini of the peptide are
clearly different - on the left side is an NH group (the "N-terminus"),
while on the right side is a CO group (the "C-terminus").

Most amino acid residues in these polypeptides have as a side-chain
taken from one of the 20 that are genetically specified - as shown in
the table below

(taken from Wikimedia
Commons). The order in which the different amino
acids in a polypeptide are found, beginning from the N-terminus, is
known as the "primary structure" of the peptide. For example, the
peptide below

has the primary structure
"tyrosine-isoleucine-serine-cystine-threonine" (or, in single-letter
code "YISCT")
Primary Sequence From Structure - Quiz
To introduce you to some of the basics of using PyMOL, and of protein
structure, look at the following image, and load this PyMOL file
into PyMOL, to decide which of the following sequences accurately
describes the primary structure of this peptide
- SDNVLIT
- SNDIVLT
- TLVIDNS
- TVLIENS

The 3D structure of the polypeptide backbone can vary greatly between
different regions of the chain, due to rotation of amino acid residues
with respect to each other - this file
shows a different polypeptide which forms a rather different structure
- much less extended than that of the previous peptide (note that as
this file contains multiple structures inferred via NMR - if you view
the structures one after the other quickly using the "play" button you
will also gain an impression of the relative stability/flexibility of
different regions of the chain.

There are several ways of chacterising the physical/chemical properties
of the twenty different amino acids e.g. there are 5 that can be found
charged (KRHDE) and 15 non-charged that can't (ACFGILMNPQSTVW) - the
diagram below gives some of the different characteristics used to
classify the different side-chains (note that such classifications are
often somewhat ambiguous - sometimes arginine (R) and lysine (K) are
classified as hydrophobic (for example in the scheme below R is
included in the hydrophobic category) as the region of the side-chain
nearest to the backbone does have hydrophobic character.

Secondary Structure
Two particular types of backbone conformation are found in many
different proteins - the "alpha helix" and
the "beta sheet".
Indeed, you have already seen examples of these two different types of
structure - the "YISCT" peptide is a strand (i.e. part of a) beta
sheet, while the much less extended peptide above is predominantly
alpha-helical.
These two different types of "secondary structure" have different
characteristic patterns of hydrogen bonds formed between backbone CO
and NH groups.
Below you can see a region of a beta-sheet - the structure is
relatively flat, the aligned backbone CO (red) and NH (blue) groups
forming hydrogen bonds with each other. Here
is the PyMOL file used to make this figure - you can see that
while, from the view below, the structure looks fairly "flat", in
reality the sheet region has a twist (this is typical for these kinds
of structures). Note that the amino-acid side chains are not shown on
these figures to make the pattern of bonding within the main chain
easier to see.

Here, in contrast, are two views of the same alpha-helix - here the
hydrogen bonds (in magenta) form along the axis of the helix - here is the
PyMOL file of this structure


The image below shows an example of a protein that
contains both alpha helices and beta sheets - a pyruvate decarboxylase
from the yeast Saccharomyces cerevisiae (here's
the corresponding PyMOL file). The helices are coloured in red,
with their characteristic "spiral" representation, while the
beta sheets are coloured in yellow, using their characteristic
flattened-arrow representation. Other regions of the structure are
shown in green.
Tertiary Structure
The 3D structure of a full-length polypeptide chain (known as its
"tertiary" structure) can include many thousands of individual atoms -
representations of the whole structure of the kind shown above are
often provide too much information, such that it is not possible to
identify particular details of interest. Therefore, a range of
different methods of representation have been developed to provide
simpler views of this kind of data. Taking the wild-type of one
polypeptide of the human alpha-chain hemoglobin as an example, it is
given in this
file using the so-called "cartoon" representation

and here
with a view of the surface of the protein

A vital part of the function of a hemoglobin polypeptide chain is to
bind the haem group so that it is able to interact appropriately with
the oxygen molecules it transports. The surface view above shows what a
precise, close fit the hemoglobin group makes with the peptide chain.
An important aspect of understanding and interpreting the function of a
given protein involves characteristing molecular interactions of this
kind - for example, changes in the sequence of the hemoglobin
polypeptide that prevent the haem group from binding in this way will
presumably prevent the molecule (and hence the organism with that
version of the gene) from functioning correctly.
Quaternary Structure
A protein may be made up of several polypeptide chains - for example,
the biologically active form of human adult hemoglobin is made up of
four polypeptide chains (two alpha and two beta chains) as can be seen
in this
PyMOL file and in the image below, where each of the four chains
has been labeled with a different colour.

The 3D organisation of the multiple polypeptides (perhaps also with
other co-factors) is known as the "quaternary structure" of the protein
Protein Sequence Alignments
Alignments of protein sequences are widely-used in analyses of protein
structure (and evolution)
Sequences are aligned to each (i.e. gaps are inserted within the
sequences) so that similar residues in the different sequences are
placed in the same column. The image below shows a multiple
sequence alignment (as it involves more than two sequences) - columns
where most residues are the same from all sequences are coloured blue.

If you have the JalView sequence alignment editor software available,
you can load the
above alignment (or this FASTA
format file into some other alignment editor) and then try
inserting gaps in different positions to see how this affects the
alignment - you should see that most gaps you introduce will cause
one/several sequences to no longer have their residues aligned to
similar residues in other sequences (making the new alignment you've
created by inserting new gaps into the alignment new "worse" than the
original one you loaded into the software - in the original alignment
more residues were aligned so that similar residues were in the same
column).
There are many different software packages available that aim to
produce an as-good-as-possible alignment of a set of protein (or
nucleotide) sequences e.g. here at the EBI
Note - this page (and others on this site)
occasionally link to Wikipedia pages describing certain topics and
concepts. Due to the community-annotation of these pages, they are
liable to change, and may contain malicious or inadvertent errors
Back
To Gibson Team Training Pages