|
Biocomputing |
Gibson Group |
![]() EMBL |
Sequence Analysis Service, October, 12th-15th 1999
or
When you want to search with part of the keyword use the wild card '*' , example [swissprot-des:transmem*]
The WWW interface has many different pages like
| Select Library | to select the databanks to search with |
| Query Form | to fill out keywords and to select the fields to be displayed etc., |
| Query Result | to examine the result |
| Query Manager | to manipulate the queries |
| View Manager | to manipulate the way of displaying the result |
| LINK | to link to other databases |
| Download Query | to save the results to your disk |
| Databank | to list all available databanks under SRS |
and even a few more pages which will be explained during the practicals.
|
| | | MSVCTLLISC AILAAPTLGS LQERRLYEDL MRNYNNLERP VANHSEPVTV | | | DVDEKNQVVY VNAWLDYTWN DYNLVWDKAE YGNITDVRFP AGKIWKPDVL | | |__=> Sequence field | |__________________________=> Feature Table field | |____=> Description field |
Some people are scared when they first see this page! How about you! This is the main database selection page. Throughout the SRS session you will see the database names are 'hyper-lighted'. Clicking on them takes you to another page where you will see more information about the database. Also it shows you the different fields that this particular database has and the indexing date of the database etc., If you are lucky you will also get a WWW link to the 'home' site of the database.
Exercise: Try to find out the other way of browsing indices by clicking on the database name.
Exercise: i) How many different types of molecules are present
in the EMBL database ?
ii) How many molecule types are unannotated ?
Exercise: What are the different divisions in EMBL database ?
There are rather a lot of kinases with very diverse functions but you can narrow down your query using these operators.
We will see how to save the query results to your disk. It is important to know how to do this. eg. if you want to make a multiple alignment of the sequences.
Exercise: How do you save the whole list of your query result in a simple way ?
[swissprot-id:acha_human] > prosite
means link the swissprot acha_human entry to prosite! You get the
prosite entry.
[swissprot-id:acha_human] < prosite
means the same above, but this time you get the swissprot entry. This makes
sure that there is a cross-reference available for prosite.
Note: The PROSITE database entries each cover
a whole protein family and all matching SWISS-PROT entries are cross-referenced.
Check the Library Network to see how different libraries are linked directly or indirectly! The numbers shown there are the number of steps needed to link them. You can click on the numbers to get a clear picture about how the intermediate libraries are used for indirect links between any two databases.
[swissprot-org:human] > [swissprot-FtKey:transmem]
[swissprot-org:human] < [swissprot-FtKey:transmem]
This retrieves all the SWISS-PROT entries for the protein family to which acha_human belongs.!
This very simple query gives all the enzyme database entries for which the 3D
structure is known!
FT ZN_FING 236 260 C4-TYPE.ZN_FING means Zinc finger. The corresponding subsequence (from 236 to 260) is
C R G S R N CP I D Q H H R NQ C Q Y C R L K K C
SRS treats this as a sub-entry. This is useful since you can
collect and use them like any other database using other sequence
analysis packages. (We will make a database of
Sub-entries with the command line interface getz).
We will try the OMIM database.
OMIM was originally a book (Mendelian Inheritance in Man) of short
review articles on genes and genetically based traits of medical
importance. Now it is on-line. Most of its information is text, so that relevant articles can
be retrieved searching through keywords.
Searching OMIM can be fun!
SRS can provide links to external web sites, such as Medline, or for example if a database is not supported at EMBL. Thus Swiss-Prot entries for yeast proteins have links to SGD at Stanford. Sometimes you can learn a lot by using these external links. We'll follow up links about the human inherited disease Phenylketonuria (PKA).
getz -libs Lists all the databanks available under SRS. getz -info swissprot Prints more information about swissprot library indexing data, data fields etc., getz '[swissprot-id:acha_human]' Prints only the identification name. getz '[swissprot-id:acha_human]' -e Prints the entire entry. The default format is swissprot. getz '[swissprot-id:acha_human]' -e -sf fasta Prints the entire entry with fasta format. getz '[swissprot-id:acha_human]' -e -sf gcg Prints the entry in GCG format. getz '[swissprot-id:acha_human]' -f 'id seq' -sf fasta Prints only the id line and sequence with fasta format. getz '[swissprot-id:acha_human]' -f 'id des seq' Prints the Identification name getz '[swissprot-id:acha_human] > prosite' -e Prints the entire entry of prosite ( NEUROTR_ION_CHANNEL ). getz '[swissprot-id:acha_human] > prosite > swissprot'
getz '[embl-div:vrl] > [embl-ftkey:intron]' | less
Can you count how many hits you got ?
getz '[embl-div:vrl] > [embl-ftkey:intron]' -f 'id acc ftkey seq' | less
getz '[embl-div:vrl] > [embl-ftkey:intron]' -e > intron.dat
Note: There might still be some bugs with getz, please inform me when you find them.