Biocomputing Unit
Sequence Analysis Service
Gibson Group

Predocs00 Course

Exploring gene structure with Artemis and Gene2EST

by Toby Gibson, Chenna Ramu and Christine Gemünd, October 11th-17th 2000

In this practical we will use two software tools that are useful for examining genes. Artemis is a package for graphical display of annotated genomic sequence in EMBL format. The annotations can be edited so anyone who needs to work with (or is sequencing) segments of genomic sequence can custom annotate their sequence. Gene2EST is a new server developed by our group. It allows the user to search the EST (Expressed Sequence Tag) databases with large gene queries and map the results onto the gene sequence. In favourable cases, Gene2EST may give a good overview of gene structure including differential splicing. Artemis is used to display the graphical output from Gene2EST.

Getting started

Getting the gene sequence for the practical:

Exercise 1. Examining the hsak1 gene annotation with Artemis

The EMBL database is distributed as a "flat file" format, as are most other biological databases. This means it is simply a text file with a defined structure to it, allowing any programmer to easily write software that can retrieve specified information. Flat files are inefficient for cross-referencing data, but the portability and accessibility versus the historically high cost of relational database architectures have been overriding factors. Nevertheless, wading through large texts can be time consuming and dull for users while biologists like to present gene structure with graphics. Artemis is a program developed to display gene features graphically and to edit the features while adhering to correct EMBL format. Artemis is written in the JAVA graphical language, which ensures portability: most users at EMBL would use the Mac or PC version, rather than the UNIX one used here.

So far we have only worked within the Artemis windows themselves. Now lets look at what we can do with the pull down menus.

Exercise 2. Using Gene2EST and Artemis to reveal the hsak1 gene structure as defined by ESTs

ESTs - Expressed Sequence Tags - are randomly cloned and sequenced cDNAs. Because no special analyses are undertaken, economies of scale have enabled huge quantitites of ESTs to be sequenced: >2 million for human and >1million for mouse. Originally, ESTs provided handles for cloning genes belonging to known gene families and mapping them to chromosomal locations. With the advent of complete genome sequences the ESTs have acquired new functions, both to reveal hitherto unknown genes and to accurately map gene structures. Currently, de novo gene prediction for complex eukaryotes is so poor that ESTs are easily the best tool for this task - provided only that genes are highly enough expressed to be well represented in the EST databases. Therefore our group has provided the Gene2EST server which specialises in exactly this task and provides output mapping ESTs onto the query gene stucture.

Getting the Query:

Launching Gene2EST:

Using the Gene2EST alignment output:

These exercises should have shown you that the Gene2EST alignment output is very useful. However, you probably still lack an overview of how things fit together. For that we need to use the Artemis graphical representation.

Using the Gene2EST graphical output with Artemis:

The Gene2EST graphical display output complements the BLAST and alignment outputs. Each has its uses: BLAST for information on search results and EST entries; Alignment for examining the matching EST sequences in detail; graphical display for the gene structure overview.

A more dramatic example: The human COL1A2 Gene:

Take Home Lessons

With our example gene, using Gene2EST we have been able to learn more about the gene structure than is present in the EMBL database entry (e.g. this suffers from an incorrect splice annotation and an "abnormal" 3' end). Gene2EST will give a good overview of a gene structure, provided that sufficient ESTs are present, and can reveal alternative splicing. It will be useless if there are no ESTs derived from the query gene. Artemis is an excellent tool for providing graphical overview of genomic sequence. In the Sanger Centre it is used for annotating small genomes (bacteria etc.) Artemis might be useful for researchers at EMBL for keeping track of gene features during experimental projects.