Recent Changes - Search:

Documentation

HintsAndTips

UsefulInfo

ElmResource

CollabWork

PmWiki

edit SideBar

PredictingGlobularDomains

Exercises for Predicting Globular domains

Save the output of each run to compare the results of each type of prediction for later. i.e. keep the browser window or tab with the results open. Data for exercises - only use this if you don't have your own protein to work on. Some people may wish to start from the bottom and work their way backwards to save everyone from trying the same servers simultaneously.

Data and sequences for exercises

Secondary Structure

  • Using the target sequence, run the secondary structure prediction algorithms to determine the likely position of any secondary structure elements for your protein. While you are waiting for these you can move on to the next question.
          o PsiPred
          o Jpred
          o ExPasy (List of tools)
  • How well do any two methods agree with each other?

Protein composition prediction

  • Using the charge prediction algorithm from EMBOSS, plot the charge distribution of the protein.
          o Charge Prediction
  • Use the protein characterisation tools from EMBOSS and SAPS to characterise your protein.
          o EMBOSS
          o SAPS

Domain Prediction

          o SMART
          o Pfam (Click sequence search)
          o CD-Search
  • Does the secondary structure prediction align with the predicted/known domains?
  • Does the charge distribution to the position of known domains.
  • Using what you know of the structure and properties of globular domains explain the distribution of charged residues.

Globularity Prediction

  • Run GlobPlot on your protein. Globplot
  • Compare the results of GlobPlot to those of the SMART domains. How well does the prediction line up with the actual domains?
  • For those domains that are not predicted well, using the information available about those domains from Pfam, CD-Search and SMART explain why they would not be predicted well with GlobPlot?. Example mdm2
  • Are there likely to be domains missing from one site present in another? If so, what is the best strategy for ensuring that you get the best prediction?
  • Look at the multiple sequence alignment for the Pfam prediction of the domain of the sequence - how much similarity is there between the sequences in the alignment?
  • What does the profile tell you about the type of similarity between the sequences that all have the domain.
  • Look at the HMM definition for the Pfam B set - what can you say about the type of similarity that the HMM uses to define the domain.
  • Run your sequence through the BLAST server (see previous days exercises). Are any of the sequences found in the alignment present in both the domain alignment and the BLAST output - try for a few sequences.

Extras - just in case. If we have time at the end of the day we can come back to these questions.

  • If you have a PDB structure of the target protein or a homologue, use the surface exposure prediction algorithm from EMBOSS (Pepinfo), plot the position of residue likely to be on the surface of the protein.
  • How does the surface exposure of the PDB file of your sequence or a related sequence relate to the arrangement of secondary structure elements.
  • If you have time - use the EMBOSS suite to characterise the protein using tools and see how these coincide with the properties of globular domains or disordered regions.
  • What are the main differences between the definition in the Multiple Sequence Alignment and the ones used in the HMM? i.e. What information is added by the HMM?
  • Other Proteins to use: Look at the proteins in the data for the exercises - how do the secondary structure prediction algorithms deal with the proteins predicted as natively disordered. For example - look at the protein epsin1. List of examples.
  • Other prediction algorithms:
          o Garnier prediction EMBOSS 

Homework:

  • Run the structure through WHATIF or PROCHECK and check the surface exposure of the conserved residues of your sequence and other related sequences.

An example of a protein that is from an older version of ENSEMBL (release 42) in which the pfam domain PF09270 is not annotated, despite the fact that at the time of release 42 this domain was present in PFAM - note that the current version of ensembl does have this domain annotated http://dec2006.archive.ensembl.org/Homo_sapiens/protview?db=core;peptide=ENSP00000354528

List of links for this part of the course: Course Links

Edit - History - Print - Recent Changes - Search
Page last modified on January 31, 2008, at 10:54 AM CET