Bring Your Own Sequences
The idea of this last session is that you use it to examine your own
sequences and biological problems from the perspective of obtaining a
However, as some of you may not currently have a specific problem you
wan to investigate, below there are several problems that we have come
up with for you to try out instead.
One possible way of proceeding could be:
Can you think of (and carry out) ways of checking your results?
- Collect huntingdin seqeuecnes by BLAST at the EBI.
- Create an alignment of the sequences.
- There are many fragments - remove as many as you think you can
get away with and still predict secondary structure, and realign the
- Adjust the alignment until you are happy with it, then predict
secondary structure for it
Examine Co-Evolution in Non-Receptor Tyrosine Kinase Proteins
Use the human SRC protein sequence as a
starting point for collecting a dataset of non-receptor
These proteins contain three different domains - tyrosine kinase, SH2,
and SH3 domains.
Once you have an alignment of such proteins that you feel is good, use
CLUSTALX to chop up this alignment into three separate files, one for
each of the three domains (use the SMART server to determine where the
boundaries between the domains are).
For each domain, use CLUSTALX to calculate an NJ tree and examine the
trees from each of the alignments at the same time. If there has been
no "domain shuffling" going on in the evolution of these proteins, we
would expect each of these alignments to yield the same/similar trees.
What are the differences and similarities between the trees?
Don't just think in terms of topology, also consider overall lengths of
the different branches, and the bootstrap values of the different
Do these results support the hypothesis that the proteins did not
undergo domain shuffling in their evolution?
Create alignments for the following set of calcium-binding proteins:
Examining the alignments, which of these proteins do you think would
be more useful for a phylogenetic analysis of vertebrates? Of
Estimate phylogeny from each of the families.
Do your results agree with your expectations? What characterises the
trees you consider better or worse at addressing a particular
Obtain secondary structure predictions for each of the proteins -
Q Do these predictions agree with each other, and with the
Align all the sequences together and estimate the phylogeny of the
Do the regions of this tree that are specific for the different
initial datasets agree with those estimated when calculating the tree
from each family on its own?
Note that throughout these exercises the
following formating is
specify different types of text
Bold non-italic text like this gives
you instructions about tasks you should carry out e.g. "View the
Italic text specifies questions for
you to answer
to Gibson Team course pages at EMBL.