Bring Your Own Sequences


Introductory Notes

The idea of this last session is that you use it to examine your own sequences and biological problems from the perspective of obtaining a useful MSA.

However, as some of you may not currently have a specific problem you wan to investigate, below there are several problems that we have come up with for you to try out instead.

Predict secondary structure of the huntingdin protein

One possible way of proceeding could be:
Can you think of (and carry out) ways of checking your results?

Examine Co-Evolution in Non-Receptor Tyrosine Kinase Proteins

Use the human SRC protein sequence as a starting point for collecting a dataset of non-receptor tyrosine-kinases.

These proteins contain three different domains - tyrosine kinase, SH2, and SH3 domains.

Once you have an alignment of such proteins that you feel is good, use CLUSTALX to chop up this alignment into three separate files, one for each of the three domains (use the SMART server to determine where the boundaries between the domains are).

For each domain, use CLUSTALX to calculate an NJ tree and examine the trees from each of the alignments at the same time. If there has been no "domain shuffling" going on in the evolution of these proteins, we would expect each of these alignments to yield the same/similar trees.

What are the differences and similarities between the trees?

Don't just think in terms of topology, also consider overall lengths of the different branches, and the bootstrap values of the different branches.

Do these results support the hypothesis that the proteins did not undergo domain shuffling in their evolution?

Calcium-Binding Proteins

Create alignments for the following set of calcium-binding proteins:

Examining the alignments, which of these proteins do you think would be more useful for a phylogenetic analysis of vertebrates? Of eukaryotes?

Estimate phylogeny from each of the families.

Do your results agree with your expectations? What characterises the trees you consider better or worse at addressing a particular problem?

Obtain secondary structure predictions for each of the proteins -

Q Do these predictions agree with each other, and with the already-solved structures?

Align all the sequences together and estimate the phylogeny of the calcium-binding domain.

Do the regions of this tree that are specific for the different initial datasets agree with those estimated when calculating the tree from each family on its own?


Note that throughout these exercises the following formating is used to specify different types of text

Bold non-italic text like this gives you instructions about tasks you should carry out e.g. "View the following webpage"

Italic text specifies questions for you to answer

Back to Gibson Team course pages at EMBL.