Biocomputing Unit
Biocomputing
Sequence Analysis Service
Gibson Group
EMBL
EMBL

Predoc Practical Course on Sequence Analysis

WWW Homology Searching Practical, Part 2: Profile searching

by Toby Gibson, Chenna Ramu and Christine Gemünd, 28/10/97

In this practical we will use profile search tools available through the web. Profile searches are one of the most sensitive search tools currently available. The raw materials for profile searching are a multiple sequence alignment in conjunction with a residue exchange matrix (e.g. the Gonnet Pam250 matrix). A profile scores the amino acids at each position in the alignment: conserved positions score more strongly than unconserved ones (whereas in a single sequence, they are all equally significant). We can look at the effect of setting up the profile with different residue substitution matrices. We can compare the sensitivity to a search with a single sequence as query.


WWW DB Tools

We will use:


Step 1 Preparing a profile from a TFIIB alignment

TFIIB is a core transcription factor in both eukaryotes and archaea which has been quite strongly conserved through evolution. TFIIB has a ~90 residue duplicated domain, the TFIIB repeats, with N- and C-terminal extensions. A second protein family in eukaryotes (but not found in archaea) shares the same structural topology, and presumably shares common ancestry, although the function is not conserved. Well-optimised searches with TFIIB queries should be able to find this second family, which has many divergent entries, and the number of entries that are picked up is a measure of the search sensitivity.

Since the multiple alignment practical is tomorrow, we have prepared an alignment for you. (If there was enough time, you could use SRS to extract the TFIIB sequences and to launch Clustal W alignment).


Step 2 BIC_Profilesearch with a TFIIB profile prepared with the Blosum62 matrix

The Bioccelerator is fast dedicated hardware exclusively designed for dynamic programming (ie. slow but sensitive) sequence comparison. It is built by the company Compugen. It can perform a number of search permutations including basic Smith-Waterman, profile searches and Protein v. DNA frame-shifting comparisons. Today we will do the Profile Search, which finds the best matching segments between a query profile (derived from a multiple alignment) and a database sequence, allowing for gaps to be inserted at any position.

The search will take a couple of minutes (unless the Bic is busy). When it is finished you can look at the high-score list and alignments in the output. Use the links to SRS and Entrez to learn about the top hits.

Questions


Step 3 BIC_Profilesearch with the TFIIB profile prepared with the Gonnet Pam250 matrix

Now repeat the search but use a profile made with a softer matrix: ie a matrix that weights similar residue exchanges more highly.

First make a new profile:

Now run another search:

Now you can compare the results of the Gonnet Pam 250 and Blosum62 matrices.

Questions


Step 4 Bic_SW search with the human TFIIB sequence

Now set up a search with TFIIB_Human, in order to determine whether profile searches are really more sensitive than a single sequence as query.

First extract a sequence with SRS:

Now run the search:

Now you can compare the results of the single sequence with the profile query.

Questions


Take Home Lessons

Optimisation of the search setup is vital: again in practice this means running test searches. Choosing a good residue substitution matrix is important. Optimisation of gap penalties is also critical (we did not look at this today). The TF2B profile is actually only slightly more sensitive than the most optimised query with a TFIIB sequence. (We were a bit wicked and chose a poor starting query: TF2B_Human does much better, as yeast has diverged more from the common ancestral sequence). However, by adding in the BRF sequences (entries TF3B*) and then the best alignable cyclins, we would bring in more and more divergent cyclins. The RB sequences are also genuine hits, but have only a single domain. Reciprocal searches with profiles based on the cyclin box, and the conserved motif in the RB family would need to be undertaken: in each case they would support the idea that these families are related. How this was done in practice, and some tips on setting up and evaluating profile searches, are given in the references below.

References


Goto Next Practical Page