Introduction: Protein Structure and Function
Primary Structure
Proteins are made up of one or more polypeptide chains (polymers of
typically more than 50 amino acids), plus in some cases additional
non-peptide
co-factors e.g. the haem groups associated with hemoglobins. This
PyMOL session file
(which can be loaded directly into a locally-installed version of the PyMOL software, and then
rotated, zoomed-in on etc.) shows a short section of a polypeptide
chain (the image below is taken from this PyMOL file) with "standard"
colouring of the
atoms (carbon in green, nitrogen in blue, oxygen in red etc.). The
repeated structure of the polypeptide backbone is clear - the atoms
along the backbone alternate blue (nitrogen), green (carbon), green
etc. The amino acid
side chains (the part of the amino acid structures that make one amino
acid (e.g. alanine, where the side chain is a methyl group) different
from another (e.g. tyrosine, where the side chain is a phenol ring) are
shown as lines. The two ends/termini of the peptide are
clearly different - on the left side is an NH group (the "N-terminus"),
while on the right side is a CO group (the "C-terminus").

Most amino acid residues found in proteins have one of the 20
side-chains shown in the table below:

(the image is taken from Wikimedia
Commons). The order in which the amino
acids occur within a polypeptide, beginning from the N-terminus, is
known as the "primary structure" of the peptide. For example, the
peptide below:

has the primary structure
"tyrosine-isoleucine-serine-cystine-threonine" (or, in single-letter
code, "YISCT")
Primary Sequence From Structure - Quiz
To introduce you to some of the basics of using PyMOL, and of protein
structure, look at the following image, and load this PyMOL file
into PyMOL, to decide which of the following sequences accurately
describes the primary structure of this peptide
- SDNVLIT
- SNDIVLT
- TLVIDNS
- TVLIENS

For the answer to this quiz, follow this
link.
The 3D structure of the polypeptide backbone can vary greatly between
different regions of the chain, due to rotation of amino acid residues
with respect to each other along the backbone - this file
shows the 3D structure of yet another a polypeptide, different in both
primary structure/sequence and 3D structure from the two shown above.
In particular, the structure of this peptide is much less extended
compared to the two previous peptides, with the peptide backbone
forming a clear twist/helix. (Note that as
this PyMOL file contains multiple structures inferred from NMR
experiments -
if you view
the structures one after the other quickly using the "play" button you
can get an impression of the relative stability/flexibility of
different regions of the chain.)

There are several ways of chacterising the physical/chemical properties
of the twenty different amino acids e.g. there are 5 that can are
typically found under physiological conditions (KRHDE) and 15
non-charged that aren't (ACFGILMNPQSTVW) - the
diagram below shows some of the different characteristics used to
classify the different side-chains. Note that such classifications are
somewhat ambiguous - for example, sometimes tyrosine (Y) is
classified as polar, other times as non-polar (in the diagram below it
is included in both categories) - in this case, this ambiguity is due
to the hydrophobic character of the region of the tyrosine side-chain
nearest to the backbone, contrasted with the more polar nature of the
alcohol/OH at the
terminal region of the side-chain.

Secondary Structure
Two particular types of backbone conformation are found in many
different proteins - the "alpha helix" and
the "beta sheet".
Indeed, you have already seen examples of these two different types of
structure - the "YISCT" peptide is a strand (i.e. part of a beta
sheet), while the much less extended peptide above is predominantly
alpha-helical.
These two different types of "secondary structure" have different
characteristic patterns of hydrogen bonds formed between backbone CO
and NH groups.
Below is a region of a beta-sheet - the structure is
relatively flat, with backbone CO (red) and NH (blue) groups
forming hydrogen bonds with each other. Here
is the PyMOL file used to make this figure - you can see that
while, from the view below, the structure looks fairly "flat", in
reality the sheet region has a twist (this is typical for these kinds
of structures). Note that the amino-acid side chains are not shown on
these figures to make the pattern of bonding within the main chain
easier to see.

Here, in contrast, are two views of the same alpha-helix - here the
hydrogen bonds (in magenta) form along the axis of the helix - here is the
PyMOL file of this structure


The image below shows an example of a protein that
contains both alpha helices and beta sheets - a pyruvate decarboxylase
from the yeast Saccharomyces cerevisiae (here's
the corresponding PyMOL file). The helices are coloured in red,
with their characteristic "spiral" representation, while the
beta sheets are coloured in yellow, using their characteristic
flattened-arrow representation. Other regions of the structure are
shown in green.
Tertiary Structure
The 3D structure of a full-length polypeptide chain (known as its
"tertiary" structure) can include many thousands of individual atoms -
representations of the whole structure of the kind used in most of the
images shown above
often provide so much information that it is not possible to
identify particular details of interest. Therefore, a range of
different kinds of representation have been developed to provide
simpler views of this kind of data. The image below, for example, shows
the wild-type chain of one
polypeptide of the human alpha-chain hemoglobin, in the so-called
"cartoon" representation - here is the
PyMOL file used to make this image

While the image below shows the same protein, but this time giving a
representaiton of the surface of the protein - here is the
PyMOL file used to make this image

A vital part of the function of a hemoglobin polypeptide chain is to
bind the haem group so that it is able to interact appropriately with
the oxygen molecules it transports. The surface view above shows what a
precise, close fit the hemoglobin group makes with the peptide chain.
An important aspect of understanding and interpreting the function of a
given protein involves characteristing molecular interactions of this
kind - for example, changes in the sequence of the hemoglobin
polypeptide that prevent the haem group from binding in this way will
presumably prevent the molecule (and hence the organism with that
version of the gene) from functioning correctly.
Quaternary Structure
A protein may be made up of several polypeptide chains - for example,
the biologically active form of human adult hemoglobin is made up of
four polypeptide chains (two alpha and two beta chains) as can be seen
in this
PyMOL file and in the image below, where each of the four chains
has been labeled with a different colour.

The 3D organisation of the multiple polypeptides (perhaps also with
other co-factors) is known as the "quaternary structure" of the protein
Protein Sequence Alignments
Alignments of protein sequences are widely-used in analyses of protein
structure (and evolution).
Sequences are aligned to each other (i.e. gaps are inserted within the
sequences) so that similar residues in the different sequences are
placed in the same column. The image below shows a multiple
sequence alignment (as it involves more than two sequences) - columns
where most residues are the same from all sequences are coloured blue.

If you have the JalView sequence alignment editor software available,
you can load the
above alignment (or this FASTA
format file into some other alignment editor) and then try
inserting gaps in different positions to see how this affects the
alignment - you should see that most gaps you introduce will cause
one/several sequences to no longer have their residues aligned to
similar residues in other sequences (making the new alignment you've
created by inserting new gaps into the alignment new "worse" than the
original one you loaded into the software - in the original alignment
more residues were aligned so that similar residues were in the same
column).
There are many different software packages available that aim to
produce an as-good-as-possible alignment of a set of protein (or
nucleotide) sequences e.g. here at the EBI
Note - this page (and others on this site)
occasionally link to Wikipedia pages describing certain topics and
concepts. Due to the community-annotation of these pages, they are
liable to change, and may contain malicious or inadvertent errors
Back
To Gibson Team Training Pages