Exercises for Predicting Globular domains
Save the output of each run to compare the results of each type of
prediction for later. i.e. keep the browser window or tab with the
results open. Data for exercises
- only use this if you don't have your own protein to work on. Some
people may wish to start from the bottom and work their way backwards
to save everyone from trying the same servers simultaneously.
- Using the target sequence, run the secondary structure
prediction algorithms to determine the likely position of any secondary
structure elements for your protein. While you are waiting for these
you can move on to the next question.
o ExPasy (List of tools)
- How well do any two methods agree with each other?
Protein composition prediction
- Using the charge prediction algorithm from EMBOSS, plot the charge distribution of the protein.
o Charge Prediction
- Use the protein characterisation tools from EMBOSS and SAPS to characterise your protein.
o Pfam (Click sequence search)
- Does the secondary structure prediction align with the predicted/known domains?
- Does the charge distribution to the position of known domains.
- Using what you know of the structure and properties of globular domains explain the distribution of charged residues.
- Run GlobPlot on your protein. Globplot
- Compare the results of GlobPlot to those of the SMART domains. How well does the prediction line up with the actual domains?
- For those domains that are not predicted well, using the information available about those domains from Pfam, CD-Search and SMART explain why they would not be predicted well with GlobPlot?. Example mdm2
- Are there likely to be domains missing from one site present
in another? If so, what is the best strategy for ensuring that you get
the best prediction?
- Look at the multiple sequence alignment for the Pfam prediction of the domain of the sequence - how much similarity is there between the sequences in the alignment?
- What does the profile tell you about the type of similarity between the sequences that all have the domain.
- Look at the HMM definition for the Pfam B set - what can you say about the type of similarity that the HMM uses to define the domain.
- Run your sequence through the BLAST
server (see previous days exercises). Are any of the sequences found in
the alignment present in both the domain alignment and the BLAST output
- try for a few sequences.
Extras - just in case.
If we have time at the end of the day we can come back to these questions.
- If you have a PDB structure of the target protein or a homologue, use the surface exposure prediction algorithm from EMBOSS (Pepinfo), plot the position of residue likely to be on the surface of the protein.
- How does the surface exposure of the PDB file of your sequence
or a related sequence relate to the arrangement of secondary structure
- If you have time - use the EMBOSS
suite to characterise the protein using tools and see how these
coincide with the properties of globular domains or disordered regions.
- What are the main differences
between the definition in the Multiple Sequence Alignment and the ones
used in the HMM? i.e. What information is added by the HMM?
- Other Proteins to use: Look at the proteins in the data for the exercises
- how do the secondary structure prediction algorithms deal with the
proteins predicted as natively disordered. For example - look at the
protein epsin1. List of examples.
- Other prediction algorithms:
o Garnier prediction EMBOSS
- Run the structure through WHATIF or PROCHECK and check the surface exposure of the conserved residues of your sequence and other related sequences.
An example of a protein that is from an older version of ENSEMBL (release 42) in which the pfam domain PF09270
is not annotated, despite the fact that at the time of release 42 this
domain was present in PFAM - note that the current version of ensembl
does have this domain annotated http://dec2006.archive.ensembl.org/Homo_sapiens/protview?db=core;peptide=ENSP00000354528
List of links for this part of the course: