Biocomputing Unit
Sequence Analysis Service
Gibson Group

Practical Course on Sequence Analysis

Generating trees from aligned sequences

by Jose Castresana and Toby Gibson, November 15th, 2000


In this practical we will mainly use the multiple alignment and tree reconstruction program Clustal X. We will run it on UNIX, but it is available for Macintosh and PCs too. Clustal X can make a multiple alignment from a set of unaligned sequences and colors the resulting alignment according to conserved features in a column, which is useful in highlighting structural or functionally important positions of the alignment. This program can also make a phylogenetic tree from the multiple alignment by the neighbor-joining method, which is very fast. We will use the following programs to make a tree:

You can also read during the practical about the following topics in the provided supplementary information:

Exercise 1: Phylogenetic tree of Eucarya, Bacteria and Archaea with EF-Tu sequences

The elongation factor (called EF-Tu in Bacteria, and EF-1alpha in eukaryotes and archaebacteria) is a very well-conserved protein that has been used to study deep phylogenetic relationships. We will make alignments and phylogenetic trees from several well-known species (and widely used in research) belonging to very divergent groups, and we will examine some phylogenetic concepts as they become necessary to understand the successive steps of the practical. We will mainly use here the term EF-Tu to refer to this protein family.

Extracting EF-Tu sequences

Aligning the sequences with Clustal X

Evaluating the alignment

Making and displaying trees


Evolutionary rate

Fibrinopeptides 8.3x10-9
Pancreatic ribonuclease 2.1x10-9
Lysozyme 2.0x10-9
Hemoglobin alpha 1.2x10-9
Myoglobin 0.89x10-9
Insulin 0.44x10-9
Cytochrome c 0.3x10-9
Histone H4 0.01x10-9

(Table taken from Kimura, 1983)

Evaluating the tree

Detecting gene duplications in a tree

Exercise 2: Phylogenetic tree of mitochondrial D-loop sequences from human populations

Phylogenetic trees can be used to study some aspects of human evolution. We need for this purpose genes that are highly variable and thus show many differences among individuals. The D loop (or control region) of the mitochondrial DNA is one of the most variable pieces of our genome and therefore it has been used in many studies of human evolution. One of the most popular aspects of these works is the african ORIGIN OF MODERN HUMANS or Out-of-Africa hypothesis, that postulates that a founder group emigrated from Africa around 100,000 years ago and colonized the rest of the world. This hypothesis predicts an early split of the nonafrican sequences within the african sequences in a phylogenetic tree of a sample of human populations, and a higher sequence diversity in african populations compared to the nonafrican populations. Let's try to estimate one of these trees.

Extracting D-loop sequences

HSMTDL001: Western Pygmy
HSMTDL002: Western Pygmy
HSMTDL004: Eastern Pygmie
HSMTDL005: Eastern Pygmie
HSMTDL006: Eastern Pygmie
HSMTDL007: !Kung
HSMTDL008: !Kung
HSMTDL009: !Kung
HSMTDL010: !Kung
HSMTDL011: !Kung
HSMTDL012: !Kung
HSMTDL014: !Kung
HSMTDL015: !Kung
HSMTDL016: !Kung
HSMTDL017: !Kung
HSMTDL018: !Kung
HSMTDL019: !Kung
HSMTDL020: !Kung
HSMTDL021: !Kung
HSMTDL022: !Kung
HSMTDL023: Asian
HSMTDL024: Yoruban
HSMTDL026: Yoruban
HSMTDL030: Eastern Pygmie
HSMTDL031: Eastern Pygmie
HSMTDL032: Eastern Pygmie
HSMTDL034: Herero
HSMTDL037: Western Pygmy
HSMTDL038: Western Pygmy
HSMTDL039: Western Pygmy
HSMTDL040: Western Pygmy
HSMTDL041: Western Pygmy
HSMTDL042: Western Pygmy
HSMTDL043: Western Pygmy
HSMTDL044: Western Pygmy
HSMTDL045: Western Pygmy
HSMTDL046: Western Pygmy
HSMTDL047: Western Pygmy
HSMTDL048: Western Pygmy
HSMTDL050: Papua N.Guinean
HSMTDL052: Herero
HSMTDL053: Herero
HSMTDL054: Herero
HSMTDL055: Herero
HSMTDL056: Herero
HSMTDL058: Asian
HSMTDL059: African American
HSMTDL061: Hazda
HSMTDL062: Hazda
HSMTDL063: African American
HSMTDL064: Hazda
HSMTDL066: Eastern Pygmie
HSMTDL067: Eastern Pygmie
HSMTDL068: Eastern Pygmie
HSMTDL069: Eastern Pygmie
HSMTDL070: Eastern Pygmie
HSMTDL071: Eastern Pygmie
HSMTDL072: Eastern Pygmie
HSMTDL073: Eastern Pygmie
HSMTDL075: Asian
HSMTDL076: Naron
HSMTDL079: Papua N.Guinean
HSMTDL080: Papua N.Guinean
HSMTDL081: Papua N.Guinean
HSMTDL082: Papua N.Guinean
HSMTDL083: Hazda
HSMTDL084: Asian
HSMTDL085: Asian
HSMTDL086: Asian
HSMTDL088: Asian
HSMTDL089: European
HSMTDL090: Asian
HSMTDL092: Asian
HSMTDL093: Asian
HSMTDL094: European
HSMTDL095: Asian
HSMTDL096: European
HSMTDL097: Papua N.Guinean
HSMTDL098: Asian
HSMTDL099: European
HSMTDL101: European
HSMTDL102: European
HSMTDL103: Yoruban
HSMTDL104: European
HSMTDL105: Herero
HSMTDL106: Yoruban
HSMTDL107: Yoruban
HSMTDL108: Papua N.Guinean
HSMTDL109: Papua N.Guinean
HSMTDL110: Papua N.Guinean
HSMTDL111: European
HSMTDL112: Asian
HSMTDL113: Asian
HSMTDL114: European
HSMTDL115: European
HSMTDL116: European
HSMTDL117: European
HSMTDL119: European
HSMTDL120: European
HSMTDL121: Asian
HSMTDL122: Asian
HSMTDL123: Asian
HSMTDL124: Asian
HSMTDL125: Papua N.Guinean
HSMTDL126: Asian
HSMTDL127: Herero
HSMTDL128: Asian
HSMTDL129: Papua N.Guinean
HSMTDL130: Papua N.Guinean
HSMTDL131: Papua N.Guinean
HSMTDL132: Papua N.Guinean
HSMTDL133: Papua N.Guinean
HSMTDL134: Papua N.Guinean
HSMTDL135: Papua N.Guinean

Making the alignment and phylogenetic tree


References on phylogenetic tree reconstruction

References for the EF-Tu exercise

These are some references where EF-Tu sequences have been used to study several phylogenetic issues:

And these are some additional references dealing with the root of the universal tree:

References on human genetic evolution

Programs and internet resources

These are two good, multi-platform programs that can be used to construct phylogenetic trees by different methods:

Here you can find a very thorough list containing most of the available programs for phylogenetic reconstruction:

And here you can find phylogenetic information about many organisms:

Address of this document:
Last updated: 16/10/2000