Predoc Practical Course on Sequence Analysis
Sequence Alignment and Analysis Practical: Part 1, Clustal X
by Toby Gibson, Chenna Ramu and Christine Gemünd, 28/10/97
In this practical we will use the multiple alignment program Clustal X.
This is a windows version of the widely used Clustal W program. We will
run Clustal X on UNIX today, but it is available for Macs and PCs too. Clustal
X colours the alignment according to conserved features in a column, which
is useful in highlighting structural or functionally important aspects of
the alignment. It can also mark regions of the alignment that score badly:
this feature allows you to home in on problem regions of an alignment. Problem
regions may be due to alignment error, inappropriate sequences or errors
in the sequences. It is often not realised, how frequent and severe the
problems can be with sequence alignments. So the exercises today, will illustrate
some of them!
Sequence extraction and alignment tools
We will use:
- SRS5 to extract and save a group of sequences to align.
- Clustal X run from the local X-Windows and UNIX environment.
- NJplot run from the local X-windows and UNIX environment.
Exercise 1. Extracting and aligning NusA sequences.
NusA is an RNA-binding transcriptional regulatory protein within the RNA
Polymerase complex of E. coli and other prokaryotes, including the Archaea.
It influences whether to terminate or read-through termination control sites
in operons.
Extracting NusA Sequences:
- Open a new netscape web browser and load this page into it.
- Load SRS5 and Start the session.
- Select the Swiss-Prot database and continue.
- Type nusa in the ID box, then Do Query.
- Select all sequences by clicking on their boxes, then the save button.
- Set Use view to Complete Entries, then click the save button.
- Check that you have all the entries listed now in EMBL format.
- Open Netscape Save As menu.
- Set to Text format, and save as e.g. nusa.dat.
Aligning NusA sequences:
- Open a Unix Shell from the Desktop pull down menu.
- In the window type prepare clustalx. This sets up the clustal environment.
- Now type clustalx and the Clustal X window should open.
- Familiarise yourself with the layout and menus.
- With Load Sequences read in the Nusa set.
- Now Do Complete Alignment.
- Using the slider, review the alignment.
Questions
- 1. Is there much variation in sequence length in the sequences?
- 2. Would you say the sequences are correctly aligned?
- (a) Partly?
- (b) Completely?
Controlling the Alignment:
- Invoke Low Scoring Segments.
- Click on Calculate Low Scoring Segments.
- Look over the alignment and note the black marked residues and regions.
- See the Gonnet PAM 250 matrix that is used to calculate the low scoring segments.
Questions
- 1. Which sequence is marked as the worst overall?
- 2. Which regions of this sequence score the worst?
- 3. How related are the bacteria which share this region (you may need to
ask us)?
- 4. Are the sequences in this set colinear?
- 5. Is it reasonable to align the full length sequences?
- 6 If not, what can be aligned?
Exercise 2. Extracting and aligning EF-Tu sequences
The elongation factors EF-Tu in bacteria and EF-1alpha in eukaryotes are
conserved orthologous families: they are important families in molecular
evolutionary analysis and have been used to draw trees including the 3 kingdoms
of life.
Extracting EF-Tu Sequences:
- Return to your Netscape SRS5 query page.
- Type eftu in the ID box, then Do Query.
- Select this set of 20 sequences by clicking on their boxes, including:
- EFTU_ANANI
- EFTU_CHLVI
- EFTU_ECOLI
- EFTU_HALHA
- EFTU_HALMA
- EFTU_HUMAN
- EFTU_METVA
- EFTU_MYCGE
- EFTU_MYCHO
- EFTU_MYCLE
- EFTU_MYCPN
- EFTU_MYCTU
- EFTU_ODOSI
- EFCTU_RICPR
- EFTU_SALTY
- EFTU_SPIPL
- EFTU_THEAQ
- EFTU_THEMA
- EFTU_THICU
- EFTU_YEAST
- Then click the save button.
- Set Use view to Complete Entries, then click the save button.
- Check that you have all the entries listed now in EMBL format.
- Open Netscape Save As menu.
- Set to Text format, and save as e.g. eftu.dat.
Aligning the EF-Tu sequences:
- Open an X-window as before.
- In the window type prepare clustalx.
- then type clustalx and the Clustal X window should open.
- With Load Sequences read in the EF-Tu set.
- Now Do Complete Alignment which could take 5 minutes or so.
- Now Save Sequences as GCG/MSF format. (eftu.msf will be used later for the GCG part).
- Using the slider, review the alignment.
Questions
- 1. How conserved are the sequences compared to NusA?
- Is there much variation in length in the sequences?
- If so, what is the cause of the sequence variation?
- 2. Would you say the sequences are correctly aligned?
- (a) Partly?
- (b) Completely?
Controlling the Alignment:
- Invoke Low Scoring Segments.
- Set to the stringent Gonnet PAM160 and click on Calculate Low Scoring Segments.
- Look over the alignment and note the black marked residues and regions.
Questions
- 1. Are there any sequences that are noticeably more divergent?
- Are they Archaeal?
- Are these poorly matched regions due to natural divergence or misalignment?
- 2. In the alignment region 200-300, is there a marked region of EFTU_MYCPN?
- Why should this region be so divergent, when all other sequences are clearly
conserved?
- 3. In the same alignment region 200-300, can you find other short anomlous
regions in EFTU_ODOSI, EFTU_RICPR and EFTU_SPIPL (perhaps as short as 4
residues)?
- 4. Should we suspect sequencing error and if so, what type of error?
- 5. Perhaps we should check out the EFTU_MYCPN sequence?
- 5. What effect will such error have on a tree generated from these sequences?
Exercise 3. Making and displaying trees from the EF-Tu sequence alignment
Clustal X can calculate a tree using the Neighbour-Joining algorithm and
do simple tests on reliability of the tree branching order. But it cannot
display the trees, so tree display packages must be used. On a Mac, you
could use TreeView. On Sun UNIX only, you can use TreeTool, which is quite
a nice program. Today we will use a simple but useful tree display program
NJplot.
Calculating a tree with Clustal X:
- Invoke Bootstrap NJ tree. Note the name of the tree file!
- Click on the OK button. It will take a couple of minutes to calculate a bootstrapped tree.
displaying the tree with NJplot
- If needed, open a new Unix Shell with the Desktop pull down menu.
- In the window, type prepare njplot.
- Now type njplot.
- Familiarise yourself with the layout of NJplot.
- Click on Open.
- type in the name of the tree file in the box ( by default, eftu.phb) and click OK.
Questions
- How are the eubacterial, archaeal, mitochondrial and chloroplast sequences
grouped?
- Is the tree correctly rooted (note that tree drawing methods do not know
the biological root)?
- If not, use new outgroup to root the tree more plausibly.
- Can you see the effect of the frameshift in EFTU_MYCPN?
- Which branches are unstable with bootstrap resampling?
- Should we believe these parts of the tree ?
- Are the mitochondrial sequences linked to the purple bacterial relatives?
- If not, why do you think the tree calculation might have gone wrong?
Take Home Lessons
We undertook two alignment exercises that turned out to be "naive" in the
sense that it would be incorrect to use those alignments for any further
tasks, such as making trees. We would need to go back and redo the alignments
without including the problem areas. If not, the sequences would not be
fully related, in one case due to lack of co-linearity, in the other because
of frameshift errors in determining the sequences. Were we to persist in
drawing trees, we might think we had found some unusual evolutionary relationships,
but unfortunately it would be rubbish. Sequence alignments must always be
carefully controlled, and revised as necessary, before any trees are made.
Uncertain regions of sequence alignments must be excluded from phylogenetic
analysis.
Goto Next Practical Page