Predoc Practical Course on Sequence Analysis
WWW Homology Searching Practical, Part 1
by Toby Gibson, Chenna Ramu and Christine Gemünd, 28/10/97
In this practical we will run some database search tools provided by EMBL and available through the web. Examination of the outputs may reveal some differences between the results, depending on the type of algorithm used in the sequence comparison. We will also modify the query and search set ups to illustrate the importance of a little thought in advance of (or during...) database searching. Rule No. 1 is "Know your sequence!"
WWW DB search Tools
We will use:
Step 1 Choosing an snRNP SM protein as query
SM proteins are found in snRNP complexes. There are quite a number in Swiss-Prot and they are fairly divergent, so it is difficult (or impossible) to detect them all in one search. All SM proteins share a small globular domain, but many have a C-terminal non-globular domain too. This will be used to illustrate the problems of searching with multi-domain proteins.
You now have the sequence of human SM-B protein available in a form that can be cut and pasted into the DB query forms.
Step 2 BLAST2 searching with human SM-B protein
BLAST2 is an upgraded version of BLAST, one of the most widely used database search packages. The BLAST programs find the best matching ungapped sections in a sequence comparison. The most important modification for the user to note in BLAST2 is that neighbouring ungapped segments can be now be concatenated by allowing gaps between them. This improves both sensitivity and interpretation of the results.
Step 2B BLAST 2 search with SM-B and a filter
Now repeat the search but filter out segments of "reduced sequence complexity".
Step 3 Bic_SW search with human SM-B protein
The Bioccelerator is fast dedicated hardware exclusively designed for dynamic programming (ie. slow but sensitive) sequence comparison. It is built by the company Compugen. It can perform a number of search permutations including basic Smith-Waterman, profile searches and Protein v. DNA frame-shifting comparisons. Today we will do the Smith-Waterman search, which finds the best matching segments between any two sequences, allowing for gaps to be inserted at any position.
The search will take a couple of minutes (unless the Bic is busy). When it is finished you can look at the high-score list and alignments in the output and compare the results with BLAST2.
Step3B Bic-SW search with the SM Domain only
Now repeat the search but use the globular N-terminal domain only.
Take Home Lesson
Hopefully the exercises here have illustrated that the way a search is set up is very important. The query here illustrates the effect of different sequence type. There are other parameters that often influence the search sensitivity. For example when a globular domain is longer, the Gonnet Pam250 matrix would be expected to outperform the default Blosum62 in the detection of divergent homologues, because it is less stringent and so gives longer optimally matching segments. (Over short matches it is noisier and could perform worse). Also, gap penalties are critical parameters in dynamic programming and should always be tested by trial and error. In other words, it pays to try several variations in the searches, not just accept the results of the first search.