We are studying newt homeobox
proteins. We want to place our new sequence into the context of the orthologous
genes from other vertebrates (e.g. Human, Xenopus). To do this we need to
retrieve sequences by BLAST, align with ClustalW, purify with GBLOCKS and make
the tree.
>BOX5_NOTVI
APHGACQTSGTLRSMSGSMAESLLGSDHSKAAFLEFGTGTHSPQGHYPLHSFHPPTEGPY
GGSGYGGRTLGYPYSPHGHPQHHASPYLPYHQGQHGGSLGHGGSRLDEDTELEKNTVIEN
GEIRINGKGKKIRKPRTIYSSVQLQALNQRFQQTQYLALPERAELAAHLGLTQTQVKIWF
QNKRSKYKKIMKQGSSIQEGEHLHSSASMSPCSPNIPPHWDSPMGTKGGPIGHGSYINNY
GPWYQPHHQDSMPRPQMM
Submit this newt DLL (Distal-less homeodomainprotein)
sequence as a query selecting the NCBI BLAST of SWISSPROT database.
Is an Amphibian sequence the
closest relative of your query sequence?
Format output for
"vertebrata" and ~50 sequences.
Select and retrieve sequences in
FASTA format, using the output controls. This takes a few steps in NCBI Entrez.
[NCBI names are rather long. You
could edit the unneeded text at this stage.]
Load the sequences into ClustalX
and do the alignment. Examine the alignment.
1. Is the sequence uniformly
conserved?
2. Given the uniformity (or not)
of the conservation, would you expect that your phylogenetic analysis would be
improved by incorporation of a model of between-site rate heterogeneity (i.e.
gamma and invariant sites)?
3. Is part of the alignment too
divergent for tree calculation?
4. Are there any sequence in the
alignment that you expect are fragments?
1. Delete all the fragments.
Remove gaps and realign.
2. Check for any funny sequences
in the conserved homeobox. If any occur, delete them, remove gaps, realign.
3. When the alignment seems to be
ready, save the alignment in PIR format.
1. Load the GBLOCKS server at:
http://molevol.ibmb.csic.es/Gblocks_server/index.html
2. Load the PIR format alignment
3. Toggle on smaller blocks and
less strict flanking positions. (Why are we doing this?)
4. Get blocks and save as e.g.
DLL_gb.pir
1. Load the Gblocked alignment
into ClustalX
Does
this alignment have many or few informative positions? Is it suitable for
detailed or superficial phylogeny estimation?
2. Toggle on correct for multiple
substitutions
3. Make tree
4. Display tree in NJPLOT
1. Is there a Xenopus orthologue
of our sequence (Box5_notvi)?
2. Is there another DLL from this
newt?
3. Are there other amphibian
DLLs?
4. Are DLX3_Notvi and DLL3_Xenla
orthologues?
5. Is there something odd about
Xenopus DLL numbering? What? Is this common in sequence databases?