Query STRING using YFHM_ECOLI - either enter this as an identifier, or if this causes problems, submit the sequence directly (leave STRING to detect the organism the sequence is from automatically). Make sure you carry out the query in "COG" mode.
There are a large number of groups of proteins predicted to interact with proteins related to this protein. Note that the list of interactions is ordered by "Score", a measure of how confident STRING is that a group of proteins interacts with relatives of the query sequence.
We are of course interested in finding out what kind of proteins have been predicted to be interacting with YFHM_ECOLI (or relatives of this protein). Often the COGs predicted as interactors are annotated on the results page with comments concerning their possible function. However, in this case this does not give us much useful information.
To get more information, open in a new tab or window the link associated with several of the predicted interacting COGs.
This will give you a list of the proteins involved in each COG - by following the links in this page you can find out the kind of proteins involved in the COG.
Another way of finding out more about the predicted interactors is to examine the specific evidence used to make the prediction of the interaction, by clicking on the icons below the result table. Begin by examining the evidence from the textmining analysis. This provides you with a list of text documents (mostly PubMed abstracts) that mention members of two different COGs - this provides evidence that the two proteins interact. It therefore deals with single genes that are members of the COGs, and is a good way of getting a summary of which members of the COGs have had the most research carried out on them.
Another source of data included in STRING comes from databases that contain manually-curated information about groups or pathways of proteins that are known to interact with each other e.g. KEGG, BIOCARTA etc.
Browse the set of proteins identified as belonging to the same pathways as members of the YFHM_ECOLI COG - do this by choosing the Databases icon and following links from there.
You may have noticed that the information provided from these sets of evidence was all associated with eukaryotic members of the YFHM_ECOLI COG - and as we began with a bacterial protein, we may well be more interested in identifying proteins in bacteria that may interact with YFHM_ECOLI or related proteins.
Three additional sets of data used to identify potential interactions, but which apply predominantly to prokaryotes, are the "Neighbourhood", "Fusion" and "Occurence" sources of evidence. As the only experimeintal, database and text-mining evidence for interactions with the YFHM_ECOLI COG appear to apply to eukaryotic genes, we will turn off these data types, and search only using these three evidence types, by altering the "Active Prediction Methods" at the bottom of the page, and then "Updating Parameters".
By browsing the evidence from these three data types, you can see a set of proteins likely to interact with YFHM_ECOLI or bacterial relatives of this protein.
To contrast the information that can be learned from STRING about a protein compared with simple flat file database searching and sequence similarity searching that we covered in a previous exercise, try using SRS to investigate the function of YFHM_ECOLI (e.g. view the swissprot entry for the protein)
STRING has two different modes of operation, "Protein" and "COG" - clicking on the "interactors wanted:" link on the front STRING page describes the different between them. The two modes offer a choice between greater selectivity (in the Protein mode) and greater sensitivity (in the COG mode) - the COG mode has the problem that in some cases the groups are very inclusive e.g. most serine/threonine kinases are in the same group, and it is then impossible to tease apart interactions specific to a subset of the members of this group. In contrast, the Protein mode predicts interaction partners for one protein in a specific species, helping overcome problems of this kind.
As we will now look at an example focusing on eukaryotic signalling pathways, which involve many different kinases, the mode of choice, for the reasons outlined above, is the Protein mode. If you are interested, you could try working through this exercise again, this time using the COG mode - this should help highlight some of the differences between the modes.
We begin by using STRING to gain a quick overview of some of the interactions associated with the intensively-studied proto-oncogene RAF, in humans.
The specificity of the Protein mode can be demonstrated by repeating this search, using the a RAF1 sequence from a different animal - RAF1_XENLA.
Submit this sequence to STRING.
As this protein is not in the set of precomputed proteins used by STRING, the information is mapped onto the most similar sequences that STRING can find to your query sequence.
Note also that within the vertebrates, most of the evidence used by STRING (experiments, pathway databases) is associated with mammals (and of course human in particular), and there is only a minimal amount of information obtainable from STRING about the interaction network associated with RAF1 in the other organism (the organism that is chosen might be different from that found when these exercises were set up - currently it seems to be Monodelphis domestica, a marsupial)
Repeat the search using the mouse gene RAF1_MOUSE
This should give you an amount of information intermediate to that between the other organism looked at (perhaps Monodelphis domestica) and human networks.
When working with a protein as intesively studied as RAF, the information provided by STRING can be somewhat overwhelming, making it difficult "to see the wood for the trees". One way of limiting the information presented to a more manageble amount is to restrict the evidence used to construct the interaction networks to those more reliable forms of evidence (probably the most appropriate is the manually curated information from pathway databases such as KEGG.) This allows one to use STRING to ask the question "which pathways is my protein of interest already well known to be involved in".
Based on the results obtained using human RAF, restrict the evidence to come only from "Databases".
Switch on the "Experiments" and "Textmining" evidence.
Rather than just using STRING to browse and explore the interactions associated with a particular protein, it can also be used to ask the question "How are two or more genes related to each other in the protein intreaction network" e.g. are they directly interacting with each other? If not, are they still close to one another in the network?
To ask such a question, use the "multiple names" option on the STRING from page. We will use this to look at the relationship between Interleukin-4 (IL4_HUMAN) and Raf (RAF1_HUMAN).
Enter the swissprot ID's of these two proteins in the list, and examine the network view.
Try clicking "+" under the network to expand the network around the two proteins.
STRING has a sister database, STITCH, that contains also relations involving small molecules. The user interface of STITCH is very similar to the "Protein" mode in STRING. STITCH no equivalent of the "COG" mode since distantly related orthologous proteins in many cases do not bind the same small molecules.
Try searching STITCH with the human thymidylate synthase (TYMS) protein as input. The resulting network includes several small molecules.
Thymidine is required for DNA replication and repair to take place, and inhibition of thymidine synthase is thus harmful to proliferating cells. Indeed, most of the small molecules in the network are drugs used for chemotherapy.
As you have seen, STRING provides an intricate and extensive resource for exploring information and predictions concerning the interactions of proteins.
STRING is updated regularly, but there is often a reasonable period of time between different releases - this is beacuse a lot of effort is required to prepare a new release. Thus, it may be that an interaction you would be interested in has already been annotated in an interaction database, but that this information is not yet included in STRING. To access such information, you would go directly to the websites associated with the different interaction databases.
Thus, here we ask you to simply explore several such databases, either using the RAF1_HUMAN protein, or your own proteins of interest.
Explore interactions annotated in the following databases.
In some cases you can identify records associated with your protein of interest using keyword searches and/or sequence similarity (BLAST) searches.
The aim of this session is:
-- Main.AidanBudd - 25 May 2007