Introduction: Protein Structure and Function

Primary Structure

Proteins are made up of one or more polypeptide chains (polymers of typically more than 50 amino acids), plus in some cases additional non-peptide co-factors e.g. the haem groups associated with hemoglobins. This PyMOL session file (which can be loaded directly into a locally-installed version of the PyMOL software, and then rotated, zoomed-in on etc.) shows a short section of a polypeptide chain (the image below is taken from this PyMOL file) with "standard" colouring of the atoms (carbon in green, nitrogen in blue, oxygen in red etc.). The repeated structure of the polypeptide backbone is clear - the atoms along the backbone alternate blue (nitrogen), green (carbon), green etc. The amino acid side chains (the part of the amino acid structures that make one amino acid (e.g. alanine, where the side chain is a methyl group) different from another (e.g. tyrosine, where the side chain is a phenol ring) are shown as lines. The two ends/termini of the peptide are clearly different - on the left side is an NH group (the "N-terminus"), while on the right side is a CO group (the "C-terminus").

3D structure of an extended-conformation peptide

Most amino acid residues found in proteins have one of the 20 side-chains shown in the table below:

(the image is taken from Wikimedia Commons). The order in which the amino acids occur within a polypeptide, beginning from the N-terminus, is known as the "primary structure" of the peptide. For example, the peptide below:

has the primary structure "tyrosine-isoleucine-serine-cystine-threonine" (or, in single-letter code, "YISCT")

Primary Sequence From Structure - Quiz
To introduce you to some of the basics of using PyMOL, and of protein structure, look at the following image, and load this PyMOL file into PyMOL, to decide which of the following sequences accurately describes the primary structure of this peptide


For the answer to this quiz, follow this link.

The 3D structure of the polypeptide backbone can vary greatly between different regions of the chain, due to rotation of amino acid residues with respect to each other along the backbone - this file shows the 3D structure of yet another a polypeptide, different in both primary structure/sequence and 3D structure from the two shown above. In particular, the structure of this peptide is much less extended compared to the two previous peptides, with the peptide backbone forming a clear twist/helix. (Note that as this PyMOL file contains multiple structures inferred from NMR experiments - if you view the structures one after the other quickly using the "play" button you can get an impression of the relative stability/flexibility of different regions of the chain.)

Helical peptide

There are several ways of chacterising the physical/chemical properties of the twenty different amino acids e.g. there are 5 that can are typically found under physiological conditions (KRHDE) and 15 non-charged that aren't (ACFGILMNPQSTVW) - the diagram below shows some of the different characteristics used to classify the different side-chains. Note that such classifications are somewhat ambiguous - for example, sometimes tyrosine (Y) is classified as polar, other times as non-polar (in the diagram below it is included in both categories) - in this case, this ambiguity is due to the hydrophobic character of the region of the tyrosine side-chain nearest to the backbone, contrasted with the more polar nature of the alcohol/OH at the terminal region of the side-chain.

Secondary Structure

Two particular types of backbone conformation are found in many different proteins - the "alpha helix" and the "beta sheet". Indeed, you have already seen examples of these two different types of structure - the "YISCT" peptide is a strand (i.e. part of a beta sheet), while the much less extended peptide above is predominantly alpha-helical.

These two different types of "secondary structure" have different characteristic patterns of hydrogen bonds formed between backbone CO and NH groups.

Below is a region of a beta-sheet - the structure is relatively flat, with backbone CO (red) and NH (blue) groups forming hydrogen bonds with each other. Here is the PyMOL file used to make this figure - you can see that while, from the view below, the structure looks fairly "flat", in reality the sheet region has a twist (this is typical for these kinds of structures). Note that the amino-acid side chains are not shown on these figures to make the pattern of bonding within the main chain easier to see.

Here, in contrast, are two views of the same alpha-helix - here the hydrogen bonds (in magenta) form along the axis of the helix - here is the PyMOL file of this structure

The image below shows an example of a protein that contains both alpha helices and beta sheets - a pyruvate decarboxylase from the yeast Saccharomyces cerevisiae (here's the corresponding PyMOL file). The helices are coloured in red, with their characteristic "spiral" representation, while the beta sheets are coloured in yellow, using their characteristic flattened-arrow representation. Other regions of the structure are shown in green.


Tertiary Structure

The 3D structure of a full-length polypeptide chain (known as its "tertiary" structure) can include many thousands of individual atoms - representations of the whole structure of the kind used in most of the images shown above often provide so much information that it is not possible to identify particular details of interest. Therefore, a range of different kinds of representation have been developed to provide simpler views of this kind of data. The image below, for example, shows the wild-type chain of one polypeptide of the human alpha-chain hemoglobin, in the so-called "cartoon" representation -  here is the PyMOL file used to make this image

Human hemoglobin beta chain - cartoon view

While the image below shows the same protein, but this time giving a representaiton of the surface of the protein - here is the PyMOL file used to make this image

A vital part of the function of a hemoglobin polypeptide chain is to bind the haem group so that it is able to interact appropriately with the oxygen molecules it transports. The surface view above shows what a precise, close fit the hemoglobin group makes with the peptide chain. An important aspect of understanding and interpreting the function of a given protein involves characteristing molecular interactions of this kind - for example, changes in the sequence of the hemoglobin polypeptide that prevent the haem group from binding in this way will presumably prevent the molecule (and hence the organism with that version of the gene) from functioning correctly.

Quaternary Structure

A protein may be made up of several polypeptide chains - for example, the biologically active form of human adult hemoglobin is made up of four polypeptide chains (two alpha and two beta chains) as can be seen in this PyMOL file and in the image below, where each of the four chains has been labeled with a different colour.

The 3D organisation of the multiple polypeptides (perhaps also with other co-factors) is known as the "quaternary structure" of the protein

Protein Sequence Alignments

Alignments of protein sequences are widely-used in analyses of protein structure (and evolution).

Sequences are aligned to each other (i.e. gaps are inserted within the sequences) so that similar residues in the different sequences are placed in the same column. The image below shows a multiple sequence alignment (as it involves more than two sequences) - columns where most residues are the same from all sequences are coloured blue.

If you have the JalView sequence alignment editor software available, you can load the above alignment (or this FASTA format file into some other alignment editor) and then try inserting gaps in different positions to see how this affects the alignment - you should see that most gaps you introduce will cause one/several sequences to no longer have their residues aligned to similar residues in other sequences (making the new alignment you've created by inserting new gaps into the alignment new "worse" than the original one you loaded into the software - in the original alignment more residues were aligned so that similar residues were in the same column).

There are many different software packages available that aim to produce an as-good-as-possible alignment of a set of protein (or nucleotide) sequences e.g. here at the EBI

Note - this page (and others on this site) occasionally link to Wikipedia pages describing certain topics and concepts. Due to the community-annotation of these pages, they are liable to change, and may contain malicious or inadvertent errors

Back To Gibson Team Training Pages