Day 4 - 21st April

Preparing alignments for phylogenetic analysis




We are studying newt homeobox proteins. We want to place our new sequence into the context of the orthologous genes from other vertebrates (e.g. Human, Xenopus). To do this we need to retrieve sequences by BLAST, align with ClustalW, purify with GBLOCKS and make the tree.









Submit this newt DLL (Distal-less homeodomainprotein) sequence as a query selecting the NCBI BLAST of SWISSPROT database.


Is an Amphibian sequence the closest relative of your query sequence?


Format output for "vertebrata" and ~50 sequences.


Select and retrieve sequences in FASTA format, using the output controls. This takes a few steps in NCBI Entrez.


[NCBI names are rather long. You could edit the unneeded text at this stage.]


Load the sequences into ClustalX and do the alignment. Examine the alignment.


Examine alignment

1. Is the sequence uniformly conserved?

2. Given the uniformity (or not) of the conservation, would you expect that your phylogenetic analysis would be improved by incorporation of a model of between-site rate heterogeneity (i.e. gamma and invariant sites)?

3. Is part of the alignment too divergent for tree calculation?

4. Are there any sequence in the alignment that you expect are fragments?


Refine alignment

1. Delete all the fragments. Remove gaps and realign.

2. Check for any funny sequences in the conserved homeobox. If any occur, delete them, remove gaps, realign.

3. When the alignment seems to be ready, save the alignment in PIR format.


GBLOCKS processing

1. Load the GBLOCKS server at:

2. Load the PIR format alignment

3. Toggle on smaller blocks and less strict flanking positions. (Why are we doing this?)

4. Get blocks and save as e.g. DLL_gb.pir


Make the tree

1. Load the Gblocked alignment into ClustalX

       Does this alignment have many or few informative positions? Is it suitable for detailed or superficial phylogeny estimation?

2. Toggle on correct for multiple substitutions

3. Make tree

4. Display tree in NJPLOT


Tree examination:

1. Is there a Xenopus orthologue of our sequence (Box5_notvi)?

2. Is there another DLL from this newt?

3. Are there other amphibian DLLs?

4. Are DLX3_Notvi and DLL3_Xenla orthologues?

5. Is there something odd about Xenopus DLL numbering? What? Is this common in sequence databases?