|
Training /
IntroductionToBioinformaticsOlderIntroduction to BioinformaticsAidan Budd and Venkata SatagopamWe begin by exploring some common features of bioinformatics tools and analyses - this will help provide a framework for understanding and criticising your analyses. 1. What does "Bioinformatics" mean?2. Bioinformatics involves many different types of data
3. ALL bioinformatic applications depend on identifying similarity between datasets4. ALL bioinformatic tools are designed to address one or both of these questions:Exercises - Recognising features of bioinformatic toolsIn discussion with your neighbours, consider a situation (or several situations) where you have used bioinformatic tools in the past. While using these tools:
If you have only limited experience working with bioinformatic tools, then select one (or more) of the tools described in the publications listed below and use them in this exercise. ProtTest: selection of best-fit models of protein evolution.Abascal et al. 2005 Bioinformatics 21(9) 2104-5 PMID: 15647292 JIGSAW: integration of multiple sources of evidence for gene prediction.Allen and Salzberg 2005, Bioinformatics 21(18) 3596-603 PMID:16076884 IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.Dosztanyi et al. 2005 Bioinformatics 21(16) 3433-4 PMID:15955779 4DXpress: a database for cross-species expression pattern comparisons.Haudry et al. 2007 Nucleic Acids Research 36(Database issue):D847-53 PMID:17916571 5. Bioinformatics databasesBeing aware of some of the typical features of bioinformatic database records/files can help us find what we need in this jungle of data A. Primary accession numbers and cross-referencesPrimary accession numbers aim to provide a string/number that unambiguously identifies just one record within a given database, distinguishing it from all other records in that database. DemonstrationWe'll show you the difference between trying to find the UniProt human Src kinase record without and then with knowledge of the accession number ExerciseTo illustrate the importance of accession numbers in navigating through biological databases, attempt to find records in the following databases associated either with human hemoglobin subunit beta, or with your own protein of interest, without using accession numbers (if using the example of human hemoglobin beta, we've provided links to the relevant accession numbers for you to compare with the results of your own searches.) Note that in some cases there may be several accession numbers referring to the same entity e.g. 1A00 and 1BBB are both PDB accession numbers for 3D structures of hemoglobin.
B. Overlapping bioinformatic resourcesDemonstrationAnother common situation you'll find when searching bioinformatic resources is that, in many cases, several different resources provide the same/similar kinds of information. Often you will find that there is some, but not complete, overlap between the information provided by these different sites. For example PhosphoSite, Human Protein Reference Database, Phospho.ELM, and UniProt all provide information about experimentally-demonstrated phosphorylation sites e.g. for the human epsin-1 sequence (UniProt name EPN1_HUMAN). For this reason, if you have the time and are very interested in finding evidence of particular features of your protein, it makes sense to examine more than just one resource for information of this kind. ExerciseUsing the resources above, determine the extent of overlap in the set of sites described to be phosphorylated in the protein CBP_HUMAN (or in your own protein of interest). C. User-friendly data visualisationExerciseTo give an example of the difference between the raw data, and more user-friendly representations, contrast the information found in the following links, all of which present the UniProt record for CBP_HUMAN:
For each of these sites, try and find the parts of the record that refer to:
Is one of the versions of the record easier to use than the others? Are there particular features of the representations that make them easier/more difficult for you to navigate through? 6. SRSExercise 1
Q1: Find out what are the genes, proteins, domains, function, pathways and structures related to diseases Alzheimer’s and Huntington’s?
This exercise aims to demonstrate:
• How to use the results overview in a quick search of SRS
Exercise 2
Q2a: Find all ‘Huntington’s’ related proteins?
In the case of Q2b and Q2c try to find out the difference in using all the three Boolean operators AND (&), OR (|) and BUT NOT (!)
Q2b: Find the human specific ‘Huntington’s’ disease related proteins?
Q2c: Find the mouse specific ‘Huntington’s’ disease related proteins?
If you compare the results from Q5a, with Q5b or Q5c, it will indicate how to narrow down the search results to a very particular kind of data.
Q2d: Select all the results from Q5c and find out any macromolecular related information associated with it?
Q2e: Any of above results associated with pathways, if so which pathways?
Now we will see very specific information related to 3 proteins TMEDA_HUMAN, HD_HUMAN, NR1H2. For each of these proteins answer the following questions:
Q2f: What are the different gene names that the proteins are known by?
Q2g: Get the fasta sequence?
Q2h: Which papers/evidence has been associated with these proteins?
Q2i: Has the 3D structure of any of these proteins been solved?
Interpreting the result of mutational analysis is best done in the context of 3D structural information about the protein being mutated - just one of many reasons why it can be useful to learn whether the 3D structure of a protein has already been solved.
Q2j: Find out which biological processes the proteins are involved in by following the links to GO
A useful summary of the function of a protein can be obtained from examining its description using the Gene Ontology (GO). Finding other proteins described using the same GO terms is one way of obtaining a set of functionally-related proteins.
This exercise aims to demonstrate:
• How to use query builder and Boolean operators.
• How to get the related information.
• That UniProt is a very useful resource in the way it can provide an easily-overviewed summary of a huge amount of data collected about a particular protein.
• Examples of some of the different kinds of information contained in a UniProt entry.
Exercise 3
Q3a: Get all the human specific genes associated with ‘Parkinson’s’ disease and then get the proteins correspond to them and then find out is there any structures associated with them?
Q3b: Start with a protein Q54PA5_DICDI and find out all other proteins which contains the same domains?
This exercise aims to demonstrate:
• linking facility, a unique feature of SRS
Learning Objectives
|