Recent Changes - Search:








edit SideBar


Exercise 1

Query STRING using YFHM_ECOLI? - either enter this as an identifier, or if this causes problems, submit the sequence directly (leave STRING to detect the organism the sequence is from automatically). Make sure you carry out the query in "COG" mode.

There are a large number of groups of proteins predicted to interact with proteins related to this protein. Note that the list of interactions is ordered by "Score", a measure of how confident STRING is that a group of proteins interacts with relatives of the query sequence.

We are of course interested in finding out what kind of proteins have been predicted to be interacting with YFHM_ECOLI (or relatives of this protein). Often the COGs predicted as interactors are annotated on the results page with comments concerning their possible function. However, in this case this does not give us much useful information.

To get more information, open in a new tab or window the link associated with several of the predicted interacting COGs.

This will give you a list of the proteins involved in each COG - by following the links in this page you can find out the kind of proteins involved in the COG.

  • Do the functions of proteins within the same COG tend to be similar?
  • Looking at the list of proteins in the COG, do they tend to be eukaryotic or prokaryotic?

Another way of finding out more about the predicted interactors is to examine the specific evidence used to make the prediction of the interaction, by clicking on the icons below the result table. Begin by examining the evidence from the textmining analysis. This provides you with a list of text documents (mostly PubMed abstracts) that mention members of two different COGs - this provides evidence that the two proteins interact. It therefore deals with single genes that are members of the COGs, and is a good way of getting a summary of which members of the COGs have had the most research carried out on them.

  • There are several members of the YFHM_ECOLI COG that dominate the proteins involved in this list - which are they?

Another source of data included in STRING comes from databases that contain manually-curated information about groups or pathways of proteins that are known to interact with each other e.g. KEGG, BIOCARTA etc.

Browse the set of proteins identified as belonging to the same pathways as members of the YFHM_ECOLI COG - do this by choosing the Databases icon and following links from there.

  • Is there some kind of trend in the biological function of these pathways? e.g. signalling, immune response, protein synthesis, cell adhesion...?

You may have noticed that the information provided from these sets of evidence was all associated with eukaryotic members of the YFHM_ECOLI COG - and as we began with a bacterial protein, we may well be more interested in identifying proteins in bacteria that may interact with YFHM_ECOLI or related proteins.

Three additional sets of data used to identify potential interactions, but which apply predominantly to prokaryotes, are the "Neighbourhood", "Fusion" and "Co-occurence" sources of evidence. As the only experimental, database and text-mining evidence for interactions with the YFHM_ECOLI COG appear to apply to eukaryotic genes, we will turn off these data types, and search only using these three evidence types, by altering the "Active Prediction Methods" at the bottom of the page, and then "Updating Parameters".

By browsing the evidence from these three data types, you can see a set of proteins likely to interact with YFHM_ECOLI or bacterial relatives of this protein.

  • Look at the annotation for members of the likely-interacting COG - can you suggest a function for any of these likely interactors?

To contrast the information that can be learned from STRING about a protein compared with simple flat file database searching and sequence similarity searching that we covered in a previous exercise, try using SRS to investigate the function of YFHM_ECOLI (e.g. view the swissprot entry for the protein)

  • Do you find our more or less about the protein using STRING or swissprot?

Exercise 2

STRING has two different modes of operation, "Protein" and "COG" - clicking on the "interactors wanted:" link on the front STRING page describes the different between them. The two modes offer a choice between greater selectivity (in the Protein mode) and greater sensitivity (in the COG mode) - the COG mode has the problem that in some cases the groups are very inclusive e.g. most serine/threonine kinases are in the same group, and it is then impossible to tease apart interactions specific to a subset of the members of this group. In contrast, the Protein mode predicts interaction partners for one protein in a specific species, helping overcome problems of this kind.

As we will now look at an example focusing on nucleotide metabolism in eukaryotes, which involves many famous targets for anti-neoplastic drugs. The mode of choice, for the reasons outlined above, is the Protein mode. If you are interested, you could try working through this exercise again, this time using the COG mode - this should help highlight some of the differences between the modes.

We begin by using STRING to gain a quick overview of some of the interactions associated with the intensively-studied thymidylate synthase (TYMS) enzyme , in humans. Enter the gene symbol (TYMS) in Protein mode into STRING, or submit the protein sequence.

The species specificity of the Protein mode can be demonstrated by repeating this search, using the TYMS enzyme from a different animal, for example, Xenopus. Note also that within the vertebrates, most of the evidence used by STRING (experiments, pathway databases) is associated with mammals (and of course human in particular), and there is less information obtainable from STRING about the interaction network associated with TYMS in the other organisms.

When working with a protein as intesively studied as TYMS, the information provided by STRING can be somewhat overwhelming, making it difficult "to see the wood for the trees". One way of limiting the information presented to a more manageable amount is to restrict the evidence used to construct the interaction networks to those more reliable forms of evidence (probably the most appropriate is the manually curated information from pathway databases such as KEGG.) This allows one to use STRING to ask the question "which pathways is my protein of interest already well known to be involved in".

Based on the results obtained using human TYMS, restrict the evidence to come only from "Databases".

  • Make a list of the processes the protein is involved in.

Switch on the "Experiments" and "Textmining" evidence.

  • Does the information you get from these two different types of data sources agree with that from the "Databases"?

Rather than just using STRING to browse and explore the interactions associated with a particular protein, it can also be used to ask the question "How are two or more genes related to each other in the protein intreaction network" e.g. are they directly interacting with each other? If not, are they still close to one another in the network?

To ask such a question, use the "multiple names" option on the STRING from page. We will use this to look at the relationship between ribonucleotide reductase (RRM1 and RRM2) and thymidylate synthase (TYMS).

Enter the gene symbols of these three proteins in the list, and examine the network view.

  • Is there likely to be a direct physical interaction between RRM1 and TYMS? Between RRM1 and RRM2?

Try clicking "+" under the network to expand the network around these proteins.

  • Can you identify any proteins that interact with both of these two proteins?

Exercise 3

STRING has a sister database, STITCH, that contains also relations involving small molecules. The user interface of STITCH is very similar to the "Protein" mode in STRING. STITCH no equivalent of the "COG" mode since distantly related orthologous proteins in many cases do not bind the same small molecules.

Try searching STITCH with the human thymidylate synthase (TYMS) protein as input. The resulting network includes several small molecules.

  • Can you identify the product of thymidylate synthase among them?
  • Is the substrate of thymidylate synthase also present in the network?

Thymidine is required for DNA replication and repair to take place, and inhibition of thymidine synthase is thus harmful to proliferating cells. Indeed, most of the small molecules in the network are drugs used for chemotherapy.

  • Are these drugs structurally similar to each other?
  • Are they similar to substrate of thymidylate synthase?
  • Can you suggest a mechanism of action?

Exercise 4

As you have seen, STRING provides an intricate and extensive resource for exploring information and predictions concerning the interactions of proteins.

STRING is updated regularly, but there is often a reasonable period of time between different releases - this is beacuse a lot of effort is required to prepare a new release. Thus, it may be that an interaction you would be interested in has already been annotated in an interaction database, but that this information is not yet included in STRING. To access such information, you would go directly to the websites associated with the different interaction databases.

Thus, here we ask you to simply explore several such databases, either using the RAF1_HUMAN? protein, or your own proteins of interest.

Explore interactions annotated in the following databases.

In some cases you can identify records associated with your protein of interest using keyword searches and/or sequence similarity (BLAST) searches.

  • Are there particular features about these different databases that make them more/less useful?

Learning Objectives

The aim of this session is:

  • Providing an overview of resources that are available to investigate experimental evidence for protein-protein and protein-small molecule interactions associated with your protein of interest
  • Providing hands-on experience of querying such resources
  • Raising awareness of situations in which it makes sense to use these resources (i.e. typical use cases)
    • Addressing the question "Which other proteins is my protein of interest known to interact with?"
    • Addressing the question "Which other proteins are relatives of my protein of interest known to interact with?"

-- Main.AidanBudd - 25 May 2007

Edit - History - Print - Recent Changes - Search
Page last modified on November 26, 2008, at 12:12 PM CET