Protein Structure and Function

Exercises and Demonstrations 

Viewing and manipulating 3D protein structures

Teaching Objectives
The aim of this section is to provide you with initial experience using the PyMOL 3D molecular structure viewing software, providing you with basic skills needed to load 3D biomolecular structures, adjust their representations (i.e. the way they are shown on the monitor screen), and save the results of these manipulations for later use.
When we examine the 3D structure of a biomolecular system at atomic level (e.g. the human haemoglobin protein), what we're really doing is looking at a description of the 3D locations of the atoms and bonds that make up the system. The coordinates describing these locations are typically determined via experiments, mostly using X-ray crystalography or NMR. The results of these experiments allow the researcher to devise a model describing the positions of the atoms and bonds that agrees well with the experimental data

Atoms that are part of the same molecule are usually described as being in the same "chain". For example, this structure of a human haemoglobin A protein consists of four polypeptides (two haemoglobin alpha and two haemoglobin beta polypeptides), includes four chains, which are labeled chain A, chain B, chain C, and chain D - visiting the PDBsum record for the structure provides further information about this structure. (Note that non-peptide groups and molecules associated with one particular polypeptide may also be labeled as part of the same chain as that polypeptide - for example, the haem group associated with the haemoglobin chain D in this structure is also labeled as part of chain D.)

Interactively examining and viewing examples of 3D protein (and other biomolecular) structures can help highlight a range of different characteristics and principles of protein structure and function. However, before we begin using PyMOL to examine structures in this way, we will focused on using the PyMOL interface - this will make it easier to carry out the later exercises described in this page, which focus more on illustrating and understanding principles of protein structure and function.
Demonstration -
PLEASE don't try and copy what we do yourselves during the demonstration - the exercises immediately after this demonstration will give you plenty of opportunity to carry out these tasks yourself, the instructions for which are provided in these pages.

With this file we will demonstrate how to:
Exercise 1 - Loading structure files, chainging viewpoint, and saving images
Load the following file into PyMOL - this is the same file as used in the demonstration, containing human haemoglobin made up of the two wildtype alpha and beta haemoglobin chains.

Links to relevant PyMOL instructions:
By changing the viewpoint of the structure, save an image of the structure that looks as much as possible like the one shown below.

Links to relevant PyMOL instructions:

This link is to the PyMOL session file used to prepare the above image
Exercise 2 - Showing amino acid sequence, chainging representation and colours of specific residues
Use PyMOL to save an image of the structure that looks as much as possible like the one below:
The following hints should help:
Links to relevant PyMOL instructions:
Exercise 3 - More practice chainging representation and colours of specific residues
Here is a more difficult example for you to try out, if you found the previous one too easy - the following hints should help: Links to relevant PyMOL instructions:

This link is to the PyMOL session file used to create the above image

Primary Structure

Teaching Objectives
To provide experience examining the 3D structures of short regions of polypeptide chains so that you are better able to identify:
Proteins are made up of one or more polypeptide chains (polymers of typically more than 50 amino acids), plus in some cases additional non-peptide co-factors e.g. the haem groups associated with haemoglobins.

The sequence of amino acids that makes up a polypeptide chain is known as the "primary structure" - the sequence is usually (i.e. almost always) written beginning with the N-terminus and ending with the C-terminus. Different amino acid residues differ in the structure of their side-chains e.g. alanine, where the side chain is a methyl group, and tyrosine, where the side chain is a phenol ring.

Most amino acid residues found in proteins have one of the 20 side-chains shown in the table below - the table includes both the three-letter and the single-letter codes for the different amino acids e.g. for Arganine the three-letter code is Arg, the single-letter code is R:

(the above image is taken from Wikimedia Commons).

The huge diversity of protein structure and function is driven by the structure of the polypeptide backbone and the chemistry of the different side chains. There are many different ways of classifying the physical/chemical features of the different side chains - the diagram below shows several of these at the same time. Perhaps the most important, however, is the difference between polar and non-polar/hydrophobic sidechains, as the the exclusion of polar side chains from the protein core is one of the major factors that determines the fold of the peptide chain. This exclusion serves to maximise the number of relatively high-energy polar/hydrogen-bond interactions made by the protein.

As is implied by the diagram, these categories are somewhat ambiguous - inevitable when attempting to discretely categorise complex, continuous, context-dependent features of the different amino acids. For example, sometimes tyrosine (Y) is classified as polar, other times as non-polar (in the diagram below it is included in both categories) - in this case, part of the reason for this ambiguity is due to the hydrophobic character of the region of the tyrosine side-chain nearest to the backbone, contrasted with the more polar nature of the alcohol/OH at the terminal region of the side-chain.

Examining this 3D structure of a short region of a polypeptide highlights the following characteristics of protein structure - atoms are coloured using the default PyMOL colouring scheme (carbon in green, nitrogen in blue, oxygen in red etc.):
3D structure of an extended-conformation peptide
1. To help introduce you to the single-letter amino acid code, and to give you practice identifying different chemical groups in 3D, look at the following image, and load this PyMOL file into PyMOL, to decide which of the following sequences accurately describes the primary structure of this peptide


For the answer to this quiz, follow this link.

2. Using a print-out of the image shown below, label on it the following features:
It will help to download the PyMOL session file used to create the image and examine this in PyMOL

Secondary Structure

Teaching Objectives
To highlight the features of different kinds of protein secondary structure - in particular with respect to patterns of hydrogen bonding

Additionally, this section aims to highlight the importance of representation/visualisation when it comes to viewing bioinformatic data (and structural data in particular) - what constitutes a "right" (or "good") way of displaying a 3D structure always depends on the point/idea/concept we are trying to demonstrate/highlight with a particular image/representation.
The 3D structure of the polypeptide backbone can vary greatly between different regions of the chain, due to rotation of amino acid residues with respect to each other along the backbone.

However, while a huge range of different structures are possible, in globular proteins the majority of the peptide chain is organised in a regular way - predominantly into alpha-helices or beta strands (multiple beta strands are joined together into beta sheets). These are so-called "secondary structure elements". There are other kinds of secondary structure elements - the different elements are distinguished on the basis of their different patterns of peptide backbone hydrogen-bonding - wikipedia provides a good description of the different kinds of secondary structure elements.

Regions between secondary structure elements are commonly refered to as "loops".

Secondary structure allows the peptide chain to satisfy (much of) the hydrogen-bonding potential of the peptide backbone CO and NH groups. Satisfying this bonding potential is crucial for allowing proteins to fold properly - otherwise the presence of unbonded polar backbone CO and NH groups would disrupt the formation of a clusters of non-polar amino acid sidechains in the core of the protein, which (as already mentioned) is a key influence on protein folding.

It is important to be aware that proteins are highly dynamic - the structures we have looked at so far provide only a snapshot of a protein chain in one of the many different (although similar) states it is able to assume.
Demonstration - Alpha Helices
1. The PyMOL files we used above to explore protein primary structure all showed regions of beta-strand secondary structure - chosen, rather than regions of alpha helices or loops, as they have a more extended structure, making the periodicity of the polypeptide chain easier to visualise.

This PyMOL file shows, by contrast:
Helical peptide

2. Shown below are two different views of the same alpha-helix, with the side-chains hidden to highlight the structure of the main chain, and with main chain hydrogen bonds shown in magenta, which form along the axis of the helix - here is the PyMOL file of this structure

The orientaion of the side chains is illustrated (from the same file) in the image below - they all stick out radially from the helix axis

The axis of the helix is completely filled by atoms from the main chain - shown in the image below by spheres with the appropriate van der Waals radii

3. As mentioned above, different kinds of secondary structure can be distinguished on the basis of their patterns of inter-backbone hydrogen-bonding.

To visualise these patterns of hydrogen-bonding, it helps to show only the backbone of the chain, and to use the proximity of charged groups to estimate the locations of potential hydrogen bonding - which we'll do using this PDB file for the human haemoglobin alpha/beta chain structure. To obtain the desired view, we will use PyMOL to:
We can now focus on a particular region of the chain, and identify the residues between which putative hydrogen bonds have been formed

This should yield a PyMOL file that looks like the image shown below - which was generated from this PyMOL session file.

This provides a good example of how important it is to be change the representation of a molecule to illustrate/examine different features of the molecule - representing the molecule with all the side chains shown, or (even worse) showing all atoms as spheres, makes it much more difficult (indeed, more or less impossible) to make out features of the polar contacts/hydrogen bonding, as you can see below:
Exercises and Questions - Alpha Helices
1. Load this PyMOL session file into PyMOL. It is, again, the structure of human haemoglobin alpha/beta chains, showing only the main chain, but this time coloured so that different kinds of secondary structure element are coloured differently - helices in red, loops in green, strands in yellow (there are no strands in this structure) - it should look something like this:

Working with the first helix, and focusing only on the polar-contacts that are directed along the axis of the helix, make a note of the residue numbers of the electron acceptors for the CO bonds of these residues.

The first few are filled in for you in the table below - all of the acceptors are 4 residues C-terminal of the donors - check to see if that is indeed the case for the rest of this helix, and perhaps for some other helices in the structure














2. Looking at the structures of many different alpha helices, it was found that the distribution of the different amino acids is not random i.e. different amino acids occur at different frequencies at the N-terminus, C-terminus, and in the "middle" of the helix. For example:
(Goliaei B and Minuchehr Z, FEBS Lett. 2003, 537(1-3):121-7, PMID: 12606043 is an example of an article that discusses these issues.)

Which of the three explanations given below for features of this non-random amino acid distributions is invalid?
  1. The main chain hydrogen bonds in an alpha helix are all pointed in the same direction, providing a dipole moment along the axis of the helix, which can destabilise the structure - acidic residues at the N-terminus can act to cancel out this charge imbalance, increasing stability
  2. Proline has a bulky side chain - thus, as a result of steric hinderances with the sidechains of other amino acid residues, it can only fit at the N-terminus where there space at the end of the helix for the bulky proline side chain
  3. Proline, an imino rather than an amino acid, does not have an NH group available to act as a hydrogen-bond electron acceptor - thus, anywhere other than at the N-terminus, its presence in a helix would leave unfulfilled hydrogen-bonding potential in residues N-terminal to it
Demonstration - Beta Sheets
Below is a region of a beta-sheet - the structure is relatively flat, with backbone CO (red) and NH (blue) groups forming hydrogen bonds with each other. Here is the PyMOL file used to make this figure - you can see that while, from the view below, the structure looks fairly "flat", in reality the sheet region has a twist (this is typical for these kinds of structures). Note that the amino-acid side chains are not shown on these figures to make the pattern of bonding within the main chain easier to see.

Tertiary Structure

The 3D structure of a full-length polypeptide chain (known as its "tertiary" structure) can include many thousands of individual atoms - representations of the whole structure of the kind used in most of the images shown above often provide so much information that it is not possible to identify particular details of interest. Therefore, a range of different kinds of representation have been developed to provide simpler views of this kind of data.

The image below shows an example of a protein that contains both alpha helices and beta sheets using the so-called "cartoon" representation - the protein is a pyruvate decarboxylase from the yeast Saccharomyces cerevisiae (here's the corresponding PyMOL file). The helices are coloured in red, with their characteristic "spiral" representation, while the beta sheets are coloured in yellow, using their characteristic flattened-arrow representation. Other regions of the structure (the loops) are shown in green.


To return to the molecule we have taken most of our previous examples from, the image below shows the wild-type chain of one polypeptide of the human alpha-chain haemoglobin, irepresentation -  here is the PyMOL file used to make this image

Human hemoglobin beta chain - cartoon view

While the image below shows the same protein, but this time giving a representaiton of the surface of the protein - here is the PyMOL file used to make this image

A vital part of the function of a haemoglobin polypeptide chain is to bind the haem group so that it is able to interact appropriately with the oxygen molecules it transports. The surface view above shows what a precise, close fit the haemoglobin group makes with the peptide chain. An important aspect of understanding and interpreting the function of a given protein involves characteristing molecular interactions of this kind - for example, changes in the sequence of the haemoglobin polypeptide that prevent the haem group from binding in this way will presumably prevent the molecule (and hence the organism with that version of the gene) from functioning correctly.

Quaternary Structure

A protein may be made up of several polypeptide chains - for example, the biologically active form of human adult haemoglobin is made up of four polypeptide chains (two alpha and two beta chains) as can be seen in this PyMOL file and in the image below, where each of the four chains has been labeled with a different colour.

The 3D organisation of the multiple polypeptides (perhaps also with other co-factors) is known as the "quaternary structure" of the protein

Protein Sequence Alignments

Alignments of protein sequences are widely-used in analyses of protein structure (and evolution).

Sequences are aligned to each other (i.e. gaps are inserted within the sequences) so that similar residues in the different sequences are placed in the same column. The image below shows a multiple sequence alignment (as it involves more than two sequences) - columns where most residues are the same from all sequences are coloured blue.

If you have the JalView sequence alignment editor software available, you can load the above alignment (or this FASTA format file into some other alignment editor) and then try inserting gaps in different positions to see how this affects the alignment - you should see that most gaps you introduce will cause one/several sequences to no longer have their residues aligned to similar residues in other sequences (making the new alignment you've created by inserting new gaps into the alignment new "worse" than the original one you loaded into the software - in the original alignment more residues were aligned so that similar residues were in the same column).

There are many different software packages available that aim to produce an as-good-as-possible alignment of a set of protein (or nucleotide) sequences e.g. here at the EBI

Note - this page (and others on this site) occasionally link to Wikipedia pages describing certain topics and concepts. Due to the community-annotation of these pages, they are liable to change, and may contain malicious or inadvertent errors

Back To Gibson Team Training Pages