Exploring Linear Motifs and Phosphorylation
The aim of this session is:
After being taught this section we want that you:
Getting familar with the Phospho.ELM server
We will use IRS-1 (Insulin Receptor Substrate 1) which has many reported phosphorylation sites.
Type "insulin" into Phospho.ELM and then click on the "insulin receptor substrate 1" link. The available sites are returned.
Getting familar with the ELM server
We will use Epsin-1 as it has several motifs annotated in ELM, so you can click around the results and find things out. Epsin is a cytosolic protein involved in ubiquitin-mediated recycling of endocytosis proteins. Probably because of the recurring structure of clathrin around vesicles, many endocytosis motifs are repeated, making them easy to spot.
Load the ELM server page, type in EPN1_HUMAN, set compartment to cytosol and submit. After a few moments the results will appear.
Motifs and P-sites in P53
P53 should need little introduction. Get results for P53_human in ELM and Phospho.ELM. (compartments are nucleus and cytoplasm).
Cell cycle degradation motifs in the Aurora B kinase
Aurora Kinase is important in cell cycle progression. Many cell cycle proteins have destruction motifs to eliminate them after their function is complete. Run AURKB_HUMAN through ELM (nuclear compartment).
There is probably not a lot of reliable evidence that these destruction motifs ever occur within globular domains. Might it be that the ELM annotation needs to be reviewed? This exercise can be continued in the structure practical - you can examine the structural context of a motif in the kinase domain and see if it is buried or exposed.
Phosphorylation and candidate phosphopeptide-binding interactions
Srrm2 is a protein present in mRNA splicing complexes. Run Phospho.ELM and ELM (nuclear compartment) using the entry name SRRM2_HUMAN.
By cross reference between the two results, find out:
Using keyword association to provide surrogate significance.
The EH1 motif is found in some transcription factors where it provides a repressive function by binding to the Groucho/TLE1 repressor. Copley reported new EH1 motifs which he justified on the grounds that they were enriched in certain transcriptional keywords (PMID:16309560).
Using our SIRW server with the Lig_EH1 regular expression from ELM, we can attempt to repeat his findings. SIRW allows combined keyword + RegExp searches. However, this may overtax SIRW if everybody does it at once so we have left it till last...
The probability of an association can be estimated by the Fisher Exact Test or the related Hypergeometric Distribution. We have put up a page for the Fisher test on SIRW' while the latter is conveniently available in Excel while both should be in any statistical package (such as R, Mathematica). See the Wikipedia entries for more info on these tests.
Open SIRW http://sirw.embl.de/index.html, select Uniprot_human and then pagequery. Type "human" in the Species field and then cut and paste the EH1 motif into the Pattern field. Click Do Query, then be patient while the result loads. This search gives you two numbers: The total sequences and the subset matching the motif.
Now type in "HOX" in the Link field and repeat the search. This gives you two more numbers: The number of sequences known to have a HOX domain and the subset that match the motif.
Repeat with different domain class keywords BZIP, TBOX, ZnF_C2H2, PAS.
A second example: KEN box enrichment in cell cycle proteins
We recently used a similar approach to Copley, finding that a cell cycle destruction motif, the KEN box, is significantly enriched with cell cycle keywords (Michael et al., 2008; PMID: 18184688). If time permits you can try the motif (simply the letters KEN) and GO terms such as mitotic - microtubule - spindle to see if there is significant enrichment.
In our work we also checked for motif conservation vis-a-vis similar control motifs. We think there is the basis for a computational discovery pipeline for linear motifs and that, as the protein sequence annotation improves, keyword enrichment will be one of the key ways of supporting computational discovery of motifs.
Sequences that are useful for the exercises
Epn1_human epsin-1. Apply in ELM only
Irs1_human irs1. Apply in Phospho.ELM only
P53_Human P53 oncoprotein. Apply in ELM, Phospho.ELM (Many motifs. Does it share phosphosites with p73?)
Atn1_human atrophin. Apply in Phospho.ELM (Does it have mainly ltp or htp sites)
Aurkb_human aurora kinase b. Apply in ELM only (Which destruction motif is within the kinase domain)
Jun_human c-Jun. Apply in ELM, Phospho.ELM (cross-check servers for Mapk results)
SRRM2_HUMAN Srrm2. Phospho.ELM and ELM (Fun with phosphorylation).
Other useful servers:
-- Main.AidanBudd - 25 May 2007 -- Main.NiallHaslam - 14 Jun 2007