Biocomputing Unit
Biocomputing
Sequence Analysis Service
Gibson Group
EMBL
EMBL

EMBL

Spring '99 Practical Courses on Sequence Analysis


April 20th and 21st,
May 18th and 19th, 1999

by Toby Gibson, Chenna Ramu, Christine Gemünd and Jose Castresana


These courses are open to all who might be interested at EMBL. The Eukaryotic Gene Prediction Course provides some introduction to Webservers available on the Internet to identify genes. The Molecular Phylogeny with Maximum Likelihood Course provides an introduction to tree calculation and analysis using the maximum likelihood approach. In Gene Analysis with the Artemis and Staden Packages, we will introduce two UNIX packages that have some very useful sequence analysis features. We will provide a basic introduction to GCG in the GCG10 from the iMac Course. In addition you will learn how to install and use MacX 2.0 on a Mac. This software provides you with the necessary X Window interface to run GCG (and the other UNIX software) on Tau remotely from your Mac.

All courses can be taken individually or in combination. Each course consists of an introduction to the topic followed by a hands-on practical. Schedules for the practicals will be provided on Web pages accessible by clicking on the links below.

The students will be paired up for each X-terminal/iMac. Practicals will take place in the computer teaching lab, room V125.

Tuesday 20th April.

Wednesday 21st April.

Tuesday 18th May.

Wednesday 19th May.


Course 1 Eukaryotic Gene Prediction

This practical introduces some web servers for gene prediction. These can be accessed from any computer and are simple to use. Web servers are often a convenient way to do sequence analysis although none of the prediction servers we checked can be said to be outstanding. You should also be aware that they can be unreliable, need constant care from their providers and are not suited to every task - some of the gene prediction servers we tried were not working! Therefore sometimes you have to run programs on local machines too.


Course 2 Molecular Phylogeny with Maximum Likelihood

Maximum likelihood is widely acknowledged as one of the best (if very slow) methods for calculating trees from sequence data. If you think of trees as a best fit to the data you will realise that they can be incorrect, or at least statistically insignificant, if the data are not well resolved, as is often the case. Maximum likelihood strategies thus explicitly acknowledge the probabilistic nature of the tree reconstruction problem and the ML framework provides methods for assaying the reliability of tree branching orders. In the practical we will make and evaluate trees from smallish datasets.


Course 3 Gene Analysis with the Artemis and Staden Packages

These packages provide somewhat complementary capabilities and include features that are not available in GCG or in the gene prediction servers in Course 1. The Staden Package, developed by Rodger Staden, is a long established package for manipulating sequences. For some years development has concentrated on GAP4, an advanced sequence assembly program used at the Sanger Centre and elsewhere (but probably not of much use at EMBL). Recently, the sequence analysis programs have been given a new and attractive graphical interface - which allows custom assembly of complex graphic displays by drag and dropping component graphs! However some of the analysis functions (especially for eukaryotic gene prediction) are in need of upgrading to modern standards. Currently, the package is most likely to be useful for flexible pairwise sequence comparision in Sip4 and prokaryotic gene prediction in Nip4.

Artemis is a newly released program from the Sanger Centre which displays large regions of genomic sequence and their annotated features (which can be edited) in graphical form. It is very useful for anyone who has to work with large DNA segments encoding complex genetic information.


Course 4 GCG10 from the iMac

At EMBL, it is especially important to be able to use the GCG package on UNIX. It offers many aspects of sequence analysis that are not available on, or are unsuited to WWW servers. EMBL has licenced Mac X 2.0 which allows you to easily run UNIX-based X-windows programs from the Mac. We will first learn how to install Mac X on the iMacs. Then we'll do GCG exercises involving one, two or many sequences. Once you have mastered the WPI X-windows interface and SEQLAB multiple sequence editor, you will be able to run many other applications without much difficulty.


You can find this page at http://www.embl-heidelberg.de/~seqanal/courses/spring99/Top.html