A platform for integrating threading results with other information

Florencio Pazos1, Burkhard Rost2 and Alfonso Valencia1

1 Protein Design Group. CNB-CSIC. Cantoblanco, Madrid 28049 Spain.

2 CUBIC Columbia University; Dep. Biochemistry & Molecular Biophysics; 630 West 168th Street; New York, N.Y. 10032; U.S.A.; rost@columbia.edu.


Running title: threadalise: a threading Interface.

Keywords: threading, TOPITS, graphical interface, protein structure prediction, correlated mutations.




Abstract



Automatic applications of threading (or fold recognition) methods result in fairly low levels of accuracy. However, in the last meeting for the critical assessment of structure prediction (CASP3), threading was shown to yield sustained levels of success (http://predictioncenter.llnl.gov/casp3/Casp3.html). The difference between the automatic application of threading and the actual results presented at the CASP3 meeting resulted from experts combining automatic threading results with results from other methods for sequence analysis. This success in combining outputs from different sources illustrates the need for interactive platforms simplifying such a task (e.g. Leplae et al., 1998). Here, we present our approach towards a general work-bench for the evaluation of threading results by assessment of structural and biological relevant information.

Threadalise uses input information from two conceptually different threading programs: TOPITS (Rost, 1995; Rost et al., 1997) and Threader (Jones et al., 1992), and provides a general mechanism for integrating results from other programs. The program enables the simultaneous mapping of the following features onto the threading output: predictions of secondary structure (Rost et al., 1994), solvent accessibility (Rost et al., 1994), inter-residue contacts (in particular correlated mutations (Olmea & Valencia, 1997), physico-chemical residue scales (i.e. hydrophobicity, polarity, charge) and any other type of property provided by the user in a given format. Furthermore, the implicit threading model and related properties can be inspected visually through an interface with Rasmol (v. 2.6., Sayle & Milner-White, 1995 , Figure 1).

The Rasmol view of the protein model and the representation of the sequence-structure alignment are inter-connected. This enables operations like the selection of putative binding site residues in the sequence and their visualisation in the corresponding protein model. By default, the regions covered by the threading alignment are highlighted in the model and in the sequence. This default view represents the initial protein model suggested by threading. In the alignment window (lower panel of Figure 1) the observed (for the protein of known structure, i.e. the template) and the predicted (for the query protein) secondary structure are compared. Optionally, the reliability of the secondary structure prediction and other residue-property scales can be displayed, along with the regions matching Prosite patterns (Bairoch, 1992), putative Prodom domains (Sonnhammer & Kahn, 1994), and coiled-coil regions (Lupas et al., 1991). Additional structural information can be included by directly accessing the SCOP (Murzin et al., 1995) or FSSP (Holm & Sander, 1994) databases of protein structure similarities. The analysis of this structural information is facilitated through an additional graphical representation (not shown).

The program uses the PDB database of protein structures (Bernstein et al., 1977), the HSSP database of protein families (Sander & Schneider, 1993), and the FSSP database of protein structure alignments (Holm & Sander, 1994), as retrieved from local copies of these databases (in plain or compressed format) or directly via Internet (slower). Structural information about residues, pairs of residues, or sequence regions can be imported from external files or can be generated inside the package. Two examples for such additional structural information are correlated mutations calculated with PLOTCORR (Pazos et al., 1997) and tree-determinant residues determined with SequenceSpace (Casari et al., 1995). The results of Threader are obtained from a local installation of the package ( http://globin.bio.warwick.ac.uk/~jones/threader.html). The results of TOPITS are obtained from either a local installation of the program, or the public internet server PredictProtein (http://dodo.cpmc.columbia.edu/predictprotein). An interactive version accessing TOPITS results through WWW forms is in preparation.

Acknowledgements

We happily acknowledge, Roger Sayle, for his help with Rasmol. Suggestions from the members of the "Protein Design Group" and the participants in the "Frontiers on Protein Structure Prediction" course are also acknowledged. We have extensively used the program during our participation in CASP (http://gredos.cnb.uam.es/pazos/working.html)

References



Figure 1: Threadilise interface

Figure 1


The interface allows the direct inspection of the list of threading models and the basic information provided in the PDB, SCOP or FSSP databases (not shown in the figure). From this sorted list users can proceed to analysing particular cases by opening the windows shown in the figure with the (button "Interactive Alignment") option. The first window displays the sequence-to-structure alignment; the second window the displays the corresponding Rasmol view of the three-dimensional model implied by that alignment. The button "AddData" enables the representation of other data, like the secondary structure prediction reliability, different residue chemical properties or the predicted solvent accessibility.

The Rasmol command line is accessible in two ways: (1) by directly typing the commands, (2) through the button "Rasmol comm." enabling to configure various settings. The buttons "Residues", "Pairs", and "Regions" facilitate the import of data about individual residue properties (e.g. sequence conservation values), residue-residue properties (e.g. disulphide bridges or predicted contacts), or particular protein regions (e.g. Prosite patterns or Prodom domains). The button "Save/Export" allows for various formats to save and print the display.