Recent Changes - Search:








edit SideBar


Multiple Sequence Alignments

1. Pre-Calculated Alignments


If you can find a pre-calculated alignment that already contains a set of sequences that is useful for addressing your problem, you can save lots of time. Using the databases/resources below, we'll look at how you can get access to alignments that you can download yourself and examine locally.



Using these (and perhaps other alignment resources you might find) find a pre-calculated MSA that is will be a good starting place for building an alignment to investigate:

  1. the evolution of FGF genes (e.g. FGF4_HUMAN) in the early vertebrate lineage
  2. variation in the 3D structure of tyrosine kinase domains within the animals

2. Adding additional sequences to an alignment

If you begin your analysis with a pre-calculated alignment, it is typically missing some of your sequences of interest. Rather than re-calculating an alignment from scratch (this time including the additional sequences), we usually want to add additional sequences while keeping the initial alignment intact.

We do this using CLUSTALX's "Profile" mode.


  • Do a BLAST search to find additional plant members of the cyclin A family taken from TreeFam at the NCBI, restricting organisms to the Viridiplantae (e.g. giving a file such as this).
  • Load the initial TreeFam alignment into ClustalX
  • Switch "Mode" to "Profile Alignment Mode"
  • Load additional sequences into the second profile
    • File->Load Profile 2
    • Alignment->Align Sequences to Profile 1
    • Make sure you change the name of the result file!


Carry out a similar exercise, this time beginning with the TreeFam record for P53_HUMAN's family, this time including additional nematode sequences

3. Building an alignment "from scratch"

As we've seen, building an alignment involves making decisions about which sequences to keep/discard from an alignment etc. The decision one takes is always made with reference to the application that the alignment is going to be put to.


Thus, demonstrating the process of building an alignment from scratch also needs to be done in the context of such a question

We will imagine we are interested in identifying possible cell-cycle related linear motifs (in particular cyclin-binding motifs) in the zebrafish p53 protein. Thus, we begin by querying ELM with this sequence.

We then need to collect additional sequences similar to the zebrafish sequence - which we'll do using BLAST.

We'll align the sequences using MUSCLE at the EBI

We'll load these sequences into CLUSTALX and examine the region of interest - and then decide whether to include additional sequences, remove some of the sequences in the initial alignment etc.


Carry out a similar process, using the two examples covered in the first exercise i.e. looking to investigate either:

  1. the evolution of FGF genes (e.g. FGF4_HUMAN) in the early vertebrate lineage
  2. variation in the 3D structure of tyrosine kinase domains within the animals (e.g. the kinase domain of SRC_HUMAN)

4. Comparing different MSA tools


To get a feeling for the performance of some different MSA tools, we will look at how some different tools perform using the reference alignment provided by BaliBase for 1aboA_ref1


Try a similar exercise as the above demonstration, using the BaliBase family assigned to you

  1. 1idy_ref1
  2. 1ycc_ref4
  3. kinase1_ref4
  4. 1eft_ref5
  5. 1pfc_ref1
  6. 1wit_ref1
  7. 1csp_ref1
  8. 1ldg_ref1
  9. sh3_ref6
  10. 1mrj_ref1
  11. 1led_ref1
  12. 5ptp_ref1

For the family you analyse, which of the tools builds the alignment most similar to the BaliBase alignment?

Edit - History - Print - Recent Changes - Search
Page last modified on December 03, 2008, at 02:16 AM CET