A candidate KEN box in the important cell cycle kinase Hipk2. The sequence segment is predicted to be natively disordered and has many conserved phosphorylation motifs as well as the KEN motif. (Michael et al., 2008).
It is now clear that the regulatory decisions during eukaryotic cell signalling are made within large dynamic protein complexes (see Gibson, 2009). Cell regulation is networked, redundant and – above all – cooperative. The deeply misleading ‘kinase cascade’ metaphor needs to be retired and the sooner, the better. The proteins that take part in regulatory systems make remarkable numbers of interactions, with the corollary that they also have highly modular architectures. Therefore our main recent focus has been to develop and deploy computational tools for protein architecture analysis, working mostly at the sequence level.
Thus we coordinate development of ELM, the Eukaryotic Linear Motif resource for functional sites in modular protein sequences. Linear motifs (LMs) are short functional sites used for the dynamic assembly and regulation of large cellular protein complexes and their characterisation is essential if we are to understand cell signalling. So-called ‘hub’ proteins that make many contacts in interaction networks are being found to have abundant LMs in large segments of IUP (intrinsically unstructured protein segments). The freely available ELM resource data are now used by many bioinformatics groups to improve prediction of LM interactions, examples being the NetworKIN kinase-substrate predictor and the DILIMOT and SLIM Finder novel motif predictors.
Recent ELM developments include the addition of structure and conservation filtering.We are now actively hunting for new LM candidates; we recently proposed new candidate KEN boxes, a sequence motif that targets cell cycle proteins for destruction in anaphase (see figure), as well as KEPE, a motif of unknown function that is superposed on many sumoylation sites. We have worked closely with groups undertaking validation experiments: Michael Sattler (Munich) in the characterisation of ULM-UHM interactions used in alternative splicing, and Annalisa Pastore (Mill Hill) on a fascinating 3-way molecular switch in Ataxin-1.
We also undertake more general computational analyses of biological macromolecules. Where possible, we contribute to multidisciplinary projects involving structural and experimental groups at EMBL and elsewhere. We collaborate with Des Higgins (Dublin) and Julie Thompson (Strasbourg) to maintain and develop the ClustalW and Clustal X programs that are widely used for multiple sequence alignment. We maintain several public web servers at EMBL, including ELM, the protein linear motif resource; Phospho. ELM, a collection of some 20,000 reported phosphorylation sites; and GlobPlot, a tool for exploring protein disorder.
We will continue to hunt for regulatory motifs and may survey individual gene families in depth and will undertake proteome surveys when we have specific questions to answer. Protein interaction networks are anticipated to become increasingly important to our work. Molecular evolution is also one of the group’s interests, especially when it has practical applications. With our collaborators, we will look to build up the protein architecture tools, especially the unique ELM resource, taking them to a new level of power and applicability.We will apply the tools in the investigation of modular protein function and may deploy them in proteome and protein network analysis pipelines. Our links to experimental and structural groups should ensure that bioinformatics results feed into experimental analyses of signalling interactions and descriptions of the structures of modular proteins and their complexes, with one focus being regulatory chromatin proteins.