Biocomputing Unit
Biocomputing
Sequence Analysis Service
Gibson Group
EMBL
EMBL

EMBL

Practical Courses on Basic Sequence Analysis

November 26th-29th, 2001

by Toby Gibson, Chenna Ramu, and Aidan Budd


The courses consist of four 1/2 day modules that you can take individually or together.

A short introduction will be followed by use of sequence analysis programs using typical sequence examples. The course covers many common tasks in sequence analysis including database search and retrieval, multiple sequence alignment and sequence trees. With the human and other eukaryotic genomes becoming available, we also cover some genome investigation tools. If you are comfortable with these core tasks, it should not be difficult to find out how to do other analyses in the future.

In the practicals, you will investigate some protein families and gene sequences as an introduction to sequence analysis tools available at EMBL on UNIX or on the WWW. These will include SRS, Blast 2, Bioccelerator, SMART, ENSEMBL, Genome Browser, Gene2EST, Clustal X, and tree display programs. The students will be paired up for each LINUX PC. Pairing up is to encourage discussion. Practicals will take place in the computer teaching lab, room V124/V125.

Monday 26th November

Tuesday 27th November

Wednesday 28th November

Thursday 29th November


Course 1. Web Tools (SRS, Blast2, Bioccelerator, SMART)

This practical introduces some web servers provided at EMBL. These can be accessed from any computer and are simple to use. Web servers are often the nicest way to do sequence analysis. But you should be aware that they can be unreliable, need constant care from their providers and are not suited to every task. Sometimes you have to run programs on local machines too. Sequence database search tools are well-suited to web servers and we will try out the Blast and Bioccelerator servers at EMBL.


Course 2. Online retrieval with SRS and Web of Science

Modern biology is underpinned by the storage of vast amounts of data on computers. To use this information, it has to be retrieved - ideally, both quickly and easily. Providing information retrieval services is a specialised activity, so dedicated services tend to be provided. The major categories of data that biologists need to retrieve are biological database entries, most often nucleic acid sequences or protein structures, and publications from the scientific literature.

Like it or not, database retrieval is one of the most important activities in biological research and in the genome era is becoming indispensible. SRS, the Sequence Retrieval System, is a powerful program for indexing and retrieving data from sequence databases. SRS originated at EMBL and was then maintained at the EBI, but has now being commercialised by LION AG. Keywords are used to retrieve information from databases indexed in SRS but links between databases allow users to get much more information than the keyword alone could retrieve.

The workhorse for keyword search and retrieval of the biological literature is the PubMed server at NCBI. EMBL now also has access to Web of Science (WOS), a commercial service provided by the Science Citation Index. While WOS may provide a certain amount of pleasure counting EMBL scientists' citations, an important literature retrieval feature should not be overlooked: the ability to collect all papers that cite a key work. This search "forwards in time" allows you to retrieve papers that are relevant but are missed by the keywords that you would use.


Course 3. Genomic sequence with ENSEMBL, Genome Browser and Gene2EST

This course introduces three web servers that provide tools for investigating genes in genomic DNA. Gene2EST is a server developed in our group that accepts >100,000 bases of genomic DNA as input for BLAST searches of EST databases.  The results are parsed to provide an alignment of the genomic DNA with matching EST segments and a summary of the results in EMBL format for graphical display in Artemis, a genome sequence display and annotation program from the Sanger Centre.. In favourable cases (i.e. lots of matching ESTs), Gene2EST can give a quite precise description of gene structure, including differential splicing. ENSEMBL is the "real time" annotation tool for the human genome project jointly developed by the EBI and Sanger Centre. It uses de novo gene prediction, ESTs and protein homology to rapidly annotate genes in newly determined human genome sequence. The human genome browser web site at UC Santa Cruz provides an interface to the human genome project with tools to focus on chromosomes and subchromosomes.


Course 4. Making alignments, trees and secondary structure predictions

Multiple Sequence Alignment is perhaps the second most basic sequence analysis activity after sequence similarity searches. We will use Clustal X to make and examine multiple alignments. Alignments have many uses based around picking out conserved residues and blocks: finding functional residues, designing primers and so on. Alignments are also used as input for other methods such as calculating trees and secondary structure prediction. For bench scientists, it can also be useful to produce trees from multiple alignments, even when evolution is only indirectly relevant. For example, it is important to distinguish orthologues from paralogues when extrapolating gene function between organisms. We will calculate a tree with Clustal X and display it with NJplot. The JPred server at the EBI will be used for obtaining a secondary structure prediction.


You can find this page at http://www.embl-heidelberg.de/~seqanal/courses/Nov01/Top.html