New tools for sequence/structure analysis
EMBL Biocomputing Bork group Gibson group New Tools

New tools for sequence/structure analysis

Latest developments in the servers from the Bork Group.
Day 1. Biocomputing January '03 course.
Ivica Letunic Christian von Mering Carolina Perez-Iratxeta Miguel A. Andrade

This lesson consists of four parts, each one about one server maintained by the Bork group. In order to learn about each tool, open a window with the starting page of the server (see below) and use another window to read the suggested exercises in another window. Then follow the steps indicated in the exercise.

Part 1: SMART

SMART (Simple Modular Arcitecture Research Tool) is a web-based tool focusing on annotation of protein domains and exploration of various domain architectures.

Using a simple web interface, you can quickly scan your protein sequence for presence of any domain currently in the SMART database. Once SMART identifies the domains in your protein, you can get dettailed annotation for each domain, explore other proteins with similar domain organisation/composition...

Using SMART, you can also explore various domain architectures and query the underlying database which stores precomputed results for all proteins in SWISSPROT/SPTREMBL database.

Exercises 1.1 / 1.2 / 1.3 / 1.4

Part 2: STRING

STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) is a web-server that predicts functional associations between proteins - even for proteins that are not characterized experimentally or do not show homology to other, known proteins. The predicted associations can include direct physical interactions, but also more indirect connections such as regulatory interactions or the participation of the proteins in the same metabolic pathway.

The predictions in STRING are based on comparative genome analysis - STRING searches through complete genomes, and looks for cases where the genomic arrangement of genes suggests that they are under similar selective pressures. This can mean that they contribute to the same functional process, and it can be detected through above-expectation frequencies of conserved neighborhood, gene-fusions, and similarity of presence/absence patterns in genomes.

STRING can be queried with any protein of interest. The predictions are entirely precomputed and can be browsed graphically and quickly.

Exercise 2.1

Part 3: XplorMed

XplorMed is designed to help you with a bibliographic search in the MEDLINE database of scientific bibliography.

If you search MEDLINE with some keywords and you get hundreds of abstracts as a result you have to go reading the titles one by one in order to get those that are interesting for you. This can take some time if you have let's say 300 abstracts to look at.

XplorMed takes these abstracts and makes an analysis of the words present in the abstracts so that you can get quickly a rough idea of what is in those 300 abstracts without having to read them all. You can also sort the abstracts by subject, select subsets, expand those with similar abstracts in the MEDLINE database, and much more. Let's see that with some examples. Try later finding interesting papers about your field.

Exercise 3.1

Part 4: Genes2Diseases

Gene2Diseases is a database of candidate genes for mapped inherited human diseases.

Normally, in the last stages of the analysis of an inherited diseases, the disease is mapped to a chromosomal region by the gene responsible of the diseases is not yet known.

We have developed a method to select candidates for a given disease based on the possible relation between the phenotype of the disease and the function of the gene.

Exercise 4.1


EMBL Biocomputing Bork group Gibson group New Tools