After you have decided which protein or which domain(s) of a protein you would like to clone and express, you have to think about which expression system you would like to use. At present there are the following expression systems available:

Protein Expression Systems
Comparison of Expression Systems

Here we limit ourselves to the following three systems since they are most suited for large-scale production of proteins:

  • Escherichia coli: The expression of proteins in E. coli is the easiest, quickest and cheapest method. There are many commercial and non-commercial expression vectors available with different N- and C-terminal tags and many different strains which are optimized for special applications (for local users: see strain database).
  • Yeast: Yeast is an eukaryotic organism and has some advantages and disadvantages over E. coli. One of the major advantages is that yeast cultures can be grown to very high densities, which makes them especially useful for the production of isotope labeled protein for NMR. The two most used yeast strains are Saccharomyces cerevisiae and the methylotrophic yeast Pichia pastoris.
  • Baculovirus infected insect cells: Insect cells are a higher eukaryotic system than yeast and are able to carry out more complex post-translational modifications than the other two systems (see Comparison of Expression Systems). They also have the best machinery for the folding of mammalian proteins and, therefore, give you the best chance of obtaining soluble protein when you want to express a protein of mammalian origin. The disadvantages of insect cells are the higher costs and the longer duration before you get protein (usually 2 weeks).
  • Mammalian cells: Most labs use HEK (human embryonal kidney) or CHO (Chinese hamster ovary) cell lines for preparative expression of more complex proteins which also need proper post-translational modifications. Both cell lines can be used for both transient and stable cell line expression which is more time consuming due to the generation of stable cell lines but offers higher productivity and less variation if long-term production of a target protein is required. While these cells have usually a high capacity of producing secreted protein (up to 10s or 100s of mg per Liter, often several grams per Liter in cell lines for commercial proteins), their expression levels for intracellular proteins is usually much lower.

To determine which system is the best choice, ask yourself the following questions:

1. What type of protein do I want to express?

When you would like to express a protein of prokaryotic origin, the obvious choice is to use E. coli. The method is quick and cheap and the organism has all the machinery necessary for folding and post-translational modifications.

In case the protein is from an eukaryotic source, the method of choice will depend on more factors (see below).

2. Do I get soluble protein when I express in E. coli ?

Also for the expression of eukaryotic proteins the first method of choice is normally E. coli for the above mentioned reasons. However, many eukaryotic proteins don't fold properly in E. coli and form insoluble aggregates (inclusion bodies). Sometimes it is possible to resolubilize the protein from the inclusion bodies or improve the solubility by expressing the protein at a lower temperature. Also expression of your target protein as a fusion protein with a highly soluble partner such as glutathione-S-transferase (GST), maltose binding protein (MBP), or DsbA can improve its solubility. Often, however, it is better to change to an eukaryotic expression system because it is better equipped to fold proteins from an eukaryotic source. Thus, instead of trying out 10 different E. coli constructs, it is better to switch expression system.

3. Does my protein need post-translational modifications for structure/activity?

Many proteins need to be modified following translation in order to become active and/or adapt the proper structure. The simplest of these modifications is the removal of the N-terminal methionine residue, which can occur in all organisms. More complex modifications, like N- and O-glycosylation, phosphorylation, are exclusively carried out by eukaryotic cells. Keep in mind that not all eukaryotic cells carry out the same modifications. Check table 1 to find out which expression system carries out the post-translational modification(s) you are looking for.

4. What is the codon usage in my protein?

Not all of the 61 mRNA codons are used equally. The so-called major codons are those that occur in highly expressed genes, whereas the minor or rare codons tend to be in genes expressed at a low level. Which of the 61 codons are the rare ones depends strongly on the organism. The codon usage per organism can be found in the Codon Usage Database. For more information on the low usage codons per organisms see table 2 and table 3.

Usually, the frequency of the codon usage reflects the abundance of their cognate tRNAs. Therefore, when the codon usage of the protein you would like to express differs significantly from the average codon usage of the expression host, this could cause problems during expression. The following problems are often encountered:

  • Interrupted translation, which leads to a variety of truncated protein products.
  • Frame shifting.
  • Misincorporation of amino acids. For instance, lysine for arginine as a result of the AGA codon. This can be detected by mass spectroscopy since it causes a decrease in the molecular mass of the protein of 28 Da.
  • Inhibition of protein synthesis and cell growth.

As a consequence, the observed levels of expression are often low or there will be no expression at all. Especially in cases were rare codons are present at the 5'-end of the mRNA or where consecutive rare codons are found expression levels are low and truncated protein products are found.

To increase the expression levels of proteins containing rare codons in E. coli, two main methods are available:

  • Site-directed mutagenesis to replace the rare codons by more commonly used codons for the same residue; e.g. the rare argenines codons AGA and AGG by the E. coli preferred CGC codon.
  • Co-expression of the genes which encode rare tRNAs. There are several commercial E. coli strains available that encode for a number of the rare codon genes:
BL21 (DE3) CodonPlus-RIL
AGG/AGA (arginine), AUA (isoleucine) and CUA (leucine)
BL21 (DE3) CodonPlus-RP AGG/AGA (arginine) and CCC (proline)
Rosetta or Rosetta (DE3)
AGG/AGA (arginine), CGG (arginine), AUA (isoleucine)
CUA (leucine), CCC (proline), and GGA (glycine)

Often you will obtain a mixture of full-length protein and truncated species. Providing the protein with a C-terminal tag ( e.g. His6-tag) will help you to purify only the full-length protein using affinity chromatography.

When both above-mentioned methods fail to increase expression levels, it is time to change expression system and try to express your protein in yeast or insect cells. In cases where the protein contains many rare E. coli codons it is probably better to immediately start with an eukaryotic system.



Protein Expression. A practical approach (Higgins. S.J. & Hames, B.D., eds), Oxford University Press, 1999.

Kane, J.F. (1995) Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Current Opinions Biotechnol. 6, 494-500.

Zhang, S., Zubay, G. & Goldman, E. (1991) Low-usage codons in Escherichia coli, yeast, fruit fly and primates. Gene 105 , 61-72.

Novy, R., Drott, D., Yaeger, K. & Mierenhof, R. (2001) Overcoming the codon bias of E. coli for enhanced protein expression. inNovations 12 , 1-3.