Multiple Sequence Alignments
1. Pre-Calculated Alignments
If you can find a pre-calculated alignment that already contains a set of sequences that is useful for addressing your problem, you can save lots of time. Using the databases/resources below, we'll look at how you can get access to alignments that you can download yourself and examine locally.
>SRC_HUMAN|P12931|Proto-oncogene tyrosine-protein kinase Src (EC 188.8.131.52) ENSG00000197122 GSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEP KLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDW WLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESE TTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLC HRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLK PGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKYL RLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTA RQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERG YRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL
Using these (and perhaps other alignment resources you might find) find a pre-calculated MSA that is will be a good starting place for building an alignment to investigate:
2. Adding additional sequences to an alignment
If you begin your analysis with a pre-calculated alignment, it is typically missing some of your sequences of interest. Rather than re-calculating an alignment from scratch (this time including the additional sequences), we usually want to add additional sequences while keeping the initial alignment intact.
We do this using CLUSTALX's "Profile" mode.
Carry out a similar exercise, this time beginning with the TreeFam record for P53_HUMAN's family, this time including additional nematode sequences
3. Building an alignment "from scratch"
As we've seen, building an alignment involves making decisions about which sequences to keep/discard from an alignment etc. The decision one takes is always made with reference to the application that the alignment is going to be put to.
Thus, demonstrating the process of building an alignment from scratch also needs to be done in the context of such a question
We will imagine we are interested in identifying possible cell-cycle related linear motifs (in particular cyclin-binding motifs) in the zebrafish p53 protein. Thus, we begin by querying ELM with this sequence.
We then need to collect additional sequences similar to the zebrafish sequence - which we'll do using BLAST.
We'll align the sequences using MUSCLE at the EBI
We'll load these sequences into CLUSTALX and examine the region of interest - and then decide whether to include additional sequences, remove some of the sequences in the initial alignment etc.
Carry out a similar process, using the two examples covered in the first exercise i.e. looking to investigate either:
4. Comparing different MSA tools
To get a feeling for the performance of some different MSA tools, we will look at how some different tools perform using the reference alignment provided by BaliBase for 1aboA_ref1
Try a similar exercise as the above demonstration, using the BaliBase family assigned to you
For the family you analyse, which of the tools builds the alignment most similar to the BaliBase alignment?