Recent Changes - Search:







edit SideBar


Exploring Linear Motifs and Phosphorylation

Learning Objectives

The aim of this session is:

  • Raising awareness of characteristics of linear motifs in proteins in terms of:
    • Function
    • Three-dimensional structure
    • Primary amino acid sequence
  • Providing an overview of the algorithmic/theoretical basis of some of the methods used to investigate ELMs in proteins
  • Providing hands-on experience of bioinformatic tools that can be used to investigate ELMs
  • Raising awareness of situations in which it can be useful to identify ELMs (i.e. typical use cases)
    • Predicting a set of proteins that might interact with your sequence of interest (by identifying a potential ELM in your sequence, and knowing which set of domains such ELMs are known to interact with)

After being taught this section we want that you:

  • Are aware of common features and characteristics of ELMs
  • Gain at least a basic understanding of the algorithms/theory used to investigate ELMs
  • Know which servers and tools are available to investigate ELMs in your protein sequences
  • Know of situations in which it could be useful for you to apply these techniques





Getting familar with the Phospho.ELM server

We will use IRS-1 (Insulin Receptor Substrate 1) which has many reported phosphorylation sites.

Type "insulin" into Phospho.ELM and then click on the "insulin receptor substrate 1" link. The available sites are returned.


  • How many species are annotated? Which has the most sites? Which has the best annotated insulin receptor sites?
  • Which of pTyr, pThr, pSer are most common? Which class is the insulin receptor?
  • What do LTP and HTP mean? Does HTP outnumber LTP?
  • Is there a link to the literature on the site?
  • What is NetworKIN?
  • Click on the link in the substrate column. What do you get? What is MINT?
    • Click on a link in the Interaction Network column. What do you get?

Getting familar with the ELM server

We will use Epsin-1 as it has several motifs annotated in ELM, so you can click around the results and find things out. Epsin is a cytosolic protein involved in ubiquitin-mediated recycling of endocytosis proteins. Probably because of the recurring structure of clathrin around vesicles, many endocytosis motifs are repeated, making them easy to spot.

Load the ELM server page, type in EPN1_HUMAN, set compartment to cytosol and submit. After a few moments the results will appear.


  • Why is cell compartment important?
  • Examine the whole page - is there text output or just graphical? Why are matches sorted into different tables?
  • Almost everything in the output graphic has mouseover and is clickable - explore!
  • What is wrong with the cyclin box match?
  • Are the EH-binding motifs real?
  • Can you find the two UIMs and what do they do?
  • Are there more Clathrin Boxes than Ap2alpha binders?
  • Drill down through the LIG_AP2alpha_2 annotation
    • Which protein has the most repeats?
    • Find the experimental methods used to support the existence of a motif instance.
  • What is the meaning of the 4 prefix classes: CLV, LIG, MOD, TRG?

Motifs and P-sites in P53

P53 should need little introduction. Get results for P53_human in ELM and Phospho.ELM. (compartments are nucleus and cytoplasm).


  • How many known motifs are listed by ELM?
  • Is P53 sumoylated?
  • How many reported phosphorylation sites?
  • How many kinases are reported to phosphorylate P53?
  • Is P53 phosphorylated by checkpoint control kinases? Do they phosphorylate SP sites?
  • Do any of the phosphorylation sites map to a candidate FHA-binding site? (pThr only).
  • Which NES match is believed to be real? Why is the other one implausible?
  • Why is the MDM2-binding motif a drug target?

Cell cycle degradation motifs in the Aurora B kinase

Aurora Kinase is important in cell cycle progression. Many cell cycle proteins have destruction motifs to eliminate them after their function is complete. Run AURKB_HUMAN through ELM (nuclear compartment).


  • Are there any D-box or Ken box motif matches?
    • Are any supported by experiment? (Click through the annotation)
  • Are the matches inside or outside the kinase domain?

There is probably not a lot of reliable evidence that these destruction motifs ever occur within globular domains. Might it be that the ELM annotation needs to be reviewed? This exercise can be continued in the structure practical - you can examine the structural context of a motif in the kinase domain and see if it is buried or exposed.

Phosphorylation and candidate phosphopeptide-binding interactions

Srrm2 is a protein present in mRNA splicing complexes. Run Phospho.ELM and ELM (nuclear compartment) using the entry name SRRM2_HUMAN.

By cross reference between the two results, find out:

  • Are the reported phosphosites mostly LTP or HTP or both?
    • How many are CDK sites? Are there also cyclin-binding motifs? Do you think there is cell cycle regulation going on?
    • Are any of them candidates to be known FHA-binding motifs? (Note PhosphoThr sites only)
    • Are the sites in native disorder (e.g. IUPred)?
    • Might any of these sites be MapK sites? (Remember - docking motif also required).
    • Could S2067 be a GSK3 site?
  • Is there an NES candidate? If so is it in an ordered or disordered region?
  • Will it be straightforward to include Srrm2 in kinase cascades?

Using keyword association to provide surrogate significance.

The EH1 motif is found in some transcription factors where it provides a repressive function by binding to the Groucho/TLE1 repressor. Copley reported new EH1 motifs which he justified on the grounds that they were enriched in certain transcriptional keywords (PMID:16309560).

Using our SIRW server with the Lig_EH1 regular expression from ELM, we can attempt to repeat his findings. SIRW allows combined keyword + RegExp searches. However, this may overtax SIRW if everybody does it at once so we have left it till last...

The probability of an association can be estimated by the Fisher Exact Test or the related Hypergeometric Distribution. We have put up a page for the Fisher test on SIRW' while the latter is conveniently available in Excel while both should be in any statistical package (such as R, Mathematica). See the Wikipedia entries for more info on these tests.

Open SIRW, select Uniprot_human and then pagequery. Type "human" in the Species field and then cut and paste the EH1 motif into the Pattern field. Click Do Query, then be patient while the result loads. This search gives you two numbers: The total sequences and the subset matching the motif.

Now type in "HOX" in the Link field and repeat the search. This gives you two more numbers: The number of sequences known to have a HOX domain and the subset that match the motif.


  • Is EH1 more or less frequent in HOX proteins? Is the difference significant? If you have Excel to hand, you can put these 4 numbers into the HYPGEOMDIST function and get the probability.To do the Fisher Exact Test onthe SIRW page, you must first convert the SIRW numbers into the 4 exclusive pots for the 2X2 contingency matrix (the four permutations with/without keyword and with/without motif).

Repeat with different domain class keywords BZIP, TBOX, ZnF_C2H2, PAS.


  • Is EH1 found more, less or at the expected frequency in these transcription factor classes?
  • How would you explain it, if any classes disfavour EH1?
  • Do you think keyword associations can indicate that (most) of these motifs are functional?

A second example: KEN box enrichment in cell cycle proteins

We recently used a similar approach to Copley, finding that a cell cycle destruction motif, the KEN box, is significantly enriched with cell cycle keywords (Michael et al., 2008; PMID: 18184688). If time permits you can try the motif (simply the letters KEN) and GO terms such as mitotic - microtubule - spindle to see if there is significant enrichment.

In our work we also checked for motif conservation vis-a-vis similar control motifs. We think there is the basis for a computational discovery pipeline for linear motifs and that, as the protein sequence annotation improves, keyword enrichment will be one of the key ways of supporting computational discovery of motifs.

Sequences that are useful for the exercises

Epn1_human epsin-1. Apply in ELM only

Irs1_human irs1. Apply in Phospho.ELM only

P53_Human P53 oncoprotein. Apply in ELM, Phospho.ELM (Many motifs. Does it share phosphosites with p73?)

Atn1_human atrophin. Apply in Phospho.ELM (Does it have mainly ltp or htp sites)

Aurkb_human aurora kinase b. Apply in ELM only (Which destruction motif is within the kinase domain)

Jun_human c-Jun. Apply in ELM, Phospho.ELM (cross-check servers for Mapk results)

SRRM2_HUMAN Srrm2. Phospho.ELM and ELM (Fun with phosphorylation).

Other useful servers:

Fishers Exact test page. Explanation of the test.

Hypergeometric Test calc. Explanation

-- Main.AidanBudd - 25 May 2007 -- Main.NiallHaslam - 14 Jun 2007

Edit - History - Print - Recent Changes - Search
Page last modified on January 29, 2008, at 05:32 PM CET