|
Biocomputing |
Gibson Group |
EMBL |
by Toby Gibson
for the EMBO DNA Sequencing Course 11-21 / 11 / 97
The Aims of this practical are 2-fold:
Background
The mitochondrial D-loop or control region is probably the fastest mutating sequence region in animal DNA. It is therefore very useful for comparing closely related organisms (but useless for comparing distantly related species). Polymorphisms in the D-loop can be used for intra-species comparisons. In recent years, mtDNA polymorphisms have become popular for tracing the history of human migrations in the time span -100,000 years to present. The European populations are found to be rather homogeneous and it appears that a substantial migration of Indo-European speakers spread across Europe bringing farming technologies with them. A few European populations such as the Basques and Fins (who do not speak Indo-European languages) appear to be distinguishable from the farmers, but all Europeans are still closely grouped. The most consistently divergent groups of people, as judged by frequencies of polymorphisms, are in Africa. This is consistent with the origin of modern humans in Africa and a founder effect in all other present day humans: migration of some smallish populations out of Africa was followed by a large expansion in population size of these non-African groups.
D-loop Reference Alignment
We have taken a selection of 20 D-loop sequences representing diverse sets of humans, from the collection of Vigilant et al. (1991) Science, 253, p. 1503. The sequences mostly have a large (artificial) deletion between the two most mutable segments of the D-loop. Trees made from this data set show the greater divergence of the Africans and the much tighter grouping of European and Asian sequences. Since most of the students on the course are of European origin, we would expect the new sequences to group closely with the reference Europeans. It will be interesting to see if there are any exceptions to this....
Materials for the practical
Each PC is provided with a directory containing:
In addition, you need your own sequence in FASTA format (or another format compatible with Clustal X).
Part 1. Aligning the new sequence to the reference alignment by Profile Alignment
Clustal X is a widely used, portable alignment program that allows the user to approach alignment in a flexible manner. Clustal X allows you to reuse an old alignment and add new sequences to it, or even merge two alignments together. In Clustal X, this is known as profile alignment. This is useful in any on-going project where new sequences are being generated and alignments need updating. Adding new sequences to an old alignment has some advantages. Firstly it is a lot faster than redoing the alignment from scratch each time. Secondly, the original sequence alignment is kept intact - especially useful if the alignment had needed hand editing.
Profile Alignment in Clustal X:
Questions
Part 2. Calculating a tree in Clustal X and displaying it with NJplot
Clustal X can calculate a basic tree using the Neighbour-Joining method, but it cannot display it. There are a number of tree display packages available. In the Clustal X package, we distribute a simple but useful (and portable) tree display program NJplot, provided by Manolo Gouy (Lyon). NJplot displays the tree as a dendrogram and allows you to do some basic manipulations, most importantly re-rooting the tree. For more serious phylogenetic work, Manolo Gouy also provides the PhyloWin/SeaView package, which can do much more than we cover in the practical today. There are of course many other tree packages.
Calculating a tree with Clustal X:
Displaying the tree with NJplot:
Questions
Finally we will print the trees (probably using another program TreeView), so that we can compare and discuss the results. See one of the demonstrators to get the printing done.