SRS uses the flat file (text file) format for databases.
A database contains one or more Entries. An Entry contains several different
fields. For example a SWISS-PROT entry looks like the following
(Some fields are omitted in this example).
Fig.1 Sample SWISS-PROT entry
Click on the highlighted id name to see the full entry
_______=> Accession number
|
| ____=> name of the field (in this case identification)
| |
| | _____=> id name
| | |
| | |
| ID ACH1_CAEEL STANDARD; PRT; 498 AA.
--AC P48180;
DT 01-FEB-1996 (REL. 33, CREATED)
--- DE ACETYLCHOLINE RECEPTOR LIKE PROTEIN, ALPHA-TYPE PRECURSOR.
| OS CAENORHABDITIS ELEGANS.
| - FT TRANSMEM 231 252 POTENTIAL.
| | SQ SEQUENCE 498 AA; 57169 MW; 570ECAC1 CRC32;
|
| | | MSVCTLLISC AILAAPTLGS LQERRLYEDL MRNYNNLERP VANHSEPVTV
| | | DVDEKNQVVY VNAWLDYTWN DYNLVWDKAE YGNITDVRFP AGKIWKPDVL
| | |__=> Sequence field
| |__________________________=> Feature Table field
|
|____=> Description field
|
Getting Started
- Login to any of your favorite machines.
- Type netscape4. This brings up the netscape browser.
- Open the following URL: You get this documentation online!
http://www.embl-heidelberg.de/~seqanal/courses/srscourse/srstut.html
- Click here (Click with the middle mouse button!)
to start SRS with another window
- You get the SRS home page. Now click on start button.
You are in the Top Page where you can select databases
of interest and continue.
Finding more about databases
Some people are scared when they first see this page! How about you!
This is the main database selection page. Throughout the SRS session you
will see the database names
are 'hyper-lighted'. Clicking on them takes you
to another page where you will see more information
about the database. Also it shows you the different fields that this particular
database has and the indexing date of the database etc., If you are lucky
you will also get a WWW link to the 'home' site of the database.
Browsing the indices
- Select SWISSPROT by clicking on the check box. Press the
continue button.
- Now you see the Query Form page. All the available data fields are
highlighted and listed with the input field to be filled.
Click on ID data field. You get the Data-field information page.
It gives you information about the indexing date, status,
number of id's etc., You can browse through this.
- Type some id names (if you know any) or just click on List Values
button. Check what you get.
- Similarly try out other fields.
Exercise: Try to find out the other way of browsing indices
by clicking on the database name.
Exercise: i) How many different types of molecules are present
in the EMBL database ?
ii) How many molecule types are unannotated ?
Exercise: What are the different divisions in EMBL database ?
Querying the database
- Go to Top Page. (Click on the TOP PAGE button)
- Select SWISSPROT database by clicking the check box. Then press Continue button.
- Type kinase in the All Text field and press Do Query button.
- Check how many entries you get.
- Question: Is this going to be useful? If so when ?
- Click on an entry name and check whether you get the full entry! Browse through this
entry and see the cross-links to other databanks.
- Go back to Query Form Page using Netscape's Back button. Type kinase in the Description field. Check the result. You can narrow down your search by querying the
appropriate field. With the All Text field you get a lot of entries and
with the description field you get a lot less entries.
- Go back to Query Form Page. Click the Description field check box beneath Include in List. Press Do Query button.
- You see that the Description field now appears on the query result! You can check the other fields like
this. Try Keywords field.
Using the boolean operators ( BUTNOT, AND, OR ):
There are rather a lot of kinases with very diverse functions
but you can narrow down your query using
these operators.
- Go back to Query Form page. Type tyrosine & kinase in the description field. Now you should get the total number of entries.
-
- Go back to Query Form page. Type tyrosine & kinase ! receptor . And check the result!
- Question: did you manage to exclude all transmembrane receptor kinases?
Using the view
- Go to Query Manager Page. You see all your queries are registered here. You can reinspect the
queries.
- Select the last query. Select the Sequence Simple view under the using the view option. Click on the VIEW button. Now you get the result in a table. You can add and remove the fields
you want through the View Manager page.
Saving the query result
We will see how to save the query results to your disk.
It is important to know how to do this. eg. if you want to
make a multiple alignment of the sequences.
Linking the queries with other databases
Note: Examples given here can be typed as it is in the
Query Manager page under expression input field
SRS has two link operators. Namely '<' (left link) '>' (right link).
The query expression
[swissprot-id:acha_human] > prosite
means link the swissprot acha_human entry to prosite! You get the
prosite entry.
[swissprot-id:acha_human] < prosite
means the same above, but this time you get the swissprot entry. This makes
sure that there is a cross-reference available for prosite.
Note: The PROSITE database entries each cover
a whole protein family and all matching SWISS-PROT entries are cross-referenced.
Check the
Library Network to see how different libraries are linked directly
or indirectly! The numbers shown there are the number of steps needed to link them.
You can click on the numbers to get a clear picture about how the
intermediate libraries are used for indirect links between any two databases.
- Go to the Query Manager page. Select the last query.
- Click on the Link button.
- Now you are in the Database Link Page . Select EMBL database and
press the Continue button.
- You get all the EMBL entries which are cross-linked from SWISSPROT.
Another simple way to link queries: In the Query Manager page,
type q1 > embl and press Expression button. You can
type any valid SRS query syntax here!
More Examples:
[swissprot-id:acha_human] > prosite > swissprot
This retrieves all the SWISS-PROT entries for the protein family to which acha_human belongs.!
enzyme < pdb
This very simple query gives all the enzyme database entries for which the 3D
structure is known!
The concept of Sub-Entries and their usage
We now know how a SWISS-PROT entry looks like.
Sub-entries are made from annotated subsequences!
For example one of the Feature Table fields of SWISS-PROT entry
7UP1_DROME contains
FT ZN_FING 236 260 C4-TYPE.
ZN_FING means Zinc finger. The corresponding subsequence (from 236 to 260) is
C R G S R N CP I D Q H H R NQ C Q Y C R L K K C
SRS treats this as a sub-entry. This is useful since you can
collect and use them like any other database using other sequence
analysis packages. (We will make a database of
Sub-entries with the command line interface getz).
- Exercise: Go to Query form page. get all the sub entries for
zinc finger. Remember the FtKey field. Then click on one of
the sub entries to see how it looks. Can you get the original (parent) entry?
- Exercise: Retrieve all the intron subsequences from
the EMBL database. How many entries are there ?
Searching other databases
We will try the OMIM database.
OMIM was originally a book (Mendelian Inheritance in Man) of short
review articles on genes and genetically based traits of medical
importance. Now it is on-line. Most of its information is text, so that relevant articles can
be retrieved searching through keywords.
Searching OMIM can be fun!
- Go to Top Page
- Select the OMIM databank and press Continue.
- To learn more about DMD (Duchenne Muscular Dystrophy) disease.
Type dmd in the Keyword field.
- Have you ever wondered why do you sneeze ? Well, this time
search with sneeze inside OMIM and find out about that.
(You can even type 'ACHOO' )
Getting information from secondary sources
SRS can provide links to external web sites, such as Medline, or
for example if a database is not supported at EMBL. Thus Swiss-Prot
entries for yeast proteins have links to SGD at Stanford. Sometimes
you can learn a lot by using these external links. We'll follow up
links about the human inherited disease Phenylketonuria (PKA).
- Go to Top Page
- Select the Swiss-Prot database and press continue
Type in phenylketonuria.
-
Which of the listed entries has classical phenylketonuria ?
-
Use the OMIM and PAHdb links to find out about mutations causing PKA
- Questions
- Are PKA mutations dominant or recessive ?
- How many different classes of mutations (splice,
missence, etc) can cause PKA ?
- Is it true that missense mutations are the rarest class ?
- Are the mutations evenly distributed through the sequence ?
- Do SWISSPROT Features list all classes of mutation ?
- For an even more striking example of human genetic disease,
go to the Marfan Syndrome entry and repeat the exercise
- Documentation for SRS query syntax available!
Creating databases
In this exercise let us try to create an intron database for viral
sequences in the EMBL database.
The main reason why we use command line interface for SRS
is the network traffic on the WWW to make the new databank might be very large!
Online version of this material is available at:
http://www.embl-heidelberg.de/~seqanal/courses/Hamburg5.99/srstut.html
chenna@embl-heidelberg.de