Schneider (Reinhard) GroupPublications
Arena3D: visualizing time-driven phenotypic differences in biological systems.
Secrier, M., Pavlopoulos, G.A., Aerts, J. & Schneider, R.
BMC Bioinformatics. 2012 Mar 22;13(1):45.
ABSTRACT: BACKGROUND: Elucidating the genotype-phenotype connection is one of the big challenges of modern molecular biology. To fully understand this connection, it is necessary to consider the underlying networks and the time factor. In this context of data deluge and heterogeneous information, visualization plays an essential role in interpreting complex and dynamic topologies. Thus, software that is able to bring the network, phenotypic and temporal information together is needed. Arena3D has been previously introduced as a tool that facilitates link discovery between processes. It uses a layered display to separate different levels of information while emphasizing the connections between them. We present novel developments of the tool for the visualization and
PathVar: analysis of gene and protein expression variance in cellular pathways using microarray data.
Glaab, E. & Schneider, R.
Bioinformatics. 2012 Feb 1;28(3):446-7. Epub 2011 Nov 28.
SUMMARY: Finding significant differences between the expression levels of genes or proteins across diverse biological conditions is one of the primary goals in the analysis of functional genomics data. However, existing methods for identifying differentially expressed genes or sets of genes by comparing measures of the average expression across predefined sample groups do not detect differential variance in the expression levels across genes in cellular pathways. Since corresponding pathway deregulations occur frequently in microarray gene or protein expression data, we present a new dedicated web application, PathVar, to analyze these data sources. The software ranks pathway-representing gene/protein sets in terms of the differen
Using graph theory to analyze biological networks.
Pavlopoulos, G.A., Secrier, M., Moschopoulos, C.N., Soldatos, T.G., Kossida, S., Aerts, J., Schneider, R. & Bagos, P.G.
BioData Min. 2011 Apr 28;4:10.
ABSTRACT: Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system.
GPCRs, G-proteins, effectors and their interactions: human-gpDB, a database employing visualization tools and data integration techniques.
Satagopam, V.P., Theodoropoulou, M.C., Stampolakis, C.K., Pavlopoulos, G.A., Papandreou, N.C., Bagos, P.G., Schneider, R. & Hamodrakas, S.J.
Database (Oxford). 2010 Aug 5;2010:baq019. Print 2010.
G-protein coupled receptors (GPCRs) are a major family of membrane receptors in eukaryotic cells. They play a crucial role in the communication of a cell with the environment. Ligands bind to GPCRs on the outside of the cell, activating them by causing a conformational change, and allowing them to bind to G-proteins. Through their interaction with G-proteins, several effector molecules are activated leading to many kinds of cellular and physiological responses. The great importance of GPCRs and their corresponding signal transduction pathways is indicated by the fact that they take part in many diverse disease processes and that a large part of efforts towards drug development today is focused on them. We present Human-gpDB, a database which currently holds information about 713 human GPCRs, 36 human G-proteins and 99 human effectors. The collection of information about the interactions between these molecules was done manually and the current version of Human-gpDB holds information for about 1663 connections between GPCRs and G-proteins and 1618 connections between G-proteins and effectors. Major advantages of Human-gpDB are the integration of several external data sources and the support of advanced visualization techniques. Human-gpDB is a simple, yet a powerful tool for researchers in the life sciences field as it integrates an up-to-date, carefully curated collection of human GPCRs, G-proteins, effectors and their interactions. The database may be a reference guide for medical and pharmaceutical research, especially in the areas of understanding human diseases and chemical and drug discovery. Database URLs: http://schneider.embl.de/human_gpdb; http://bioinformatics.biol.uoa.gr/human_gpdb/
Reflect: A practical approach to web semantics
O'Donoghue, S.I.; Horn, H.; Pafilis, E.; Haag, S.;, Kuhn, M.; Satagopam, V.P.; Schneider, R., Jensen, L.J.
Journal of Web Semantics, Special Issue: Sp. Iss. SI 2-3 (8) 182-189
Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes.
Neumann, B., Walter, T., Heriche, J.K., Bulkescher, J., Erfle, H., Conrad, C., Rogers, P., Poser, I., Held, M., Liebel, U., Cetin, C., Sieckmann, F., Pau, G., Kabbe, R., Wünsche, A., Satagopam, V., Schmitz, M.H., Chapuis, C., Gerlich, D.W., Schneider, R., Eils, R., Huber, W., Peters, J.M., Hyman, A.A., Durbin, R., Pepperkok, R. & Ellenberg, J.
Nature. 2010 Apr 1;464(7289):721-7.
Despite our rapidly growing knowledge about the human genome, we do not know all of the genes required for some of the most basic functions of life. To start to fill this gap we developed a high-throughput phenotypic screening platform combining potent gene silencing by RNA interference, time-lapse microscopy and computational image processing. We carried out a genome-wide phenotypic profiling of each of the approximately 21,000 human protein-coding genes by two-day live imaging of fluorescently labelled chromosomes. Phenotypes were scored quantitatively by computational image processing, which allowed us to identify hundreds of human genes involved in diverse biological functions including cell division, migration and survival. As part of the Mitocheck consortium, this study provides an in-depth analysis of cell division phenotypes and makes the entire high-content data set available as a resource to the community.
Visualization of omics data for systems biology.
Gehlenborg, N., O'Donoghue, S.I., Baliga, N.S., Goesmann, A., Hibbs, M.A., Kitano, H., Kohlbacher, O., Neuweger, H., Schneider, R., Tenenbaum, D. & Gavin, A.C.
Nat Methods. 2010 Mar;7(3 Suppl):S56-68.
High-throughput studies of biological systems are rapidly accumulating a wealth of 'omics'-scale data. Visualization is a key aspect of both the analysis and understanding of these data, and users now have many visualization methods and tools to choose from. The challenge is to create clear, meaningful and integrated visualizations that give biological insight, without being overwhelmed by the intrinsic complexity of the data. In this review, we discuss how visualization tools are being used to help interpret protein interaction, gene expression and metabolic profile data, and we highlight emerging new directions.
Martini: using literature keywords to compare gene sets.
Soldatos, T.G., O'Donoghue, S.I., Satagopam, V.P., Jensen, L.J., Brown, N.P., Barbosa-Silva, A. & Schneider, R.
Nucleic Acids Res. 2010 Jan;38(1):26-38. Epub 2009 Oct 25.
Life scientists are often interested to compare two gene sets to gain insight into differences between two distinct, but related, phenotypes or conditions. Several tools have been developed for comparing gene sets, most of which find Gene Ontology (GO) terms that are significantly over-represented in one gene set. However, such tools often return GO terms that are too generic or too few to be informative. Here, we present Martini, an easy-to-use tool for comparing gene sets. Martini is based, not on GO, but on keywords extracted from Medline abstracts; Martini also supports a much wider range of species than comparable tools. To evaluate Martini we created a benchmark based on the human cell cycle, and we tested several comparable tools (CoPub, FatiGO, Marmite and ProfCom). Martini had the best benchmark performance, delivering a more detailed and accurate description of function. Martini also gave best or equal performance with three other datasets (related to Arabidopsis, melanoma and ovarian cancer), suggesting that Martini represents an advance in the automated comparison of gene sets. In agreement with previous studies, our results further suggest that literature-derived keywords are a richer source of gene-function information than GO annotations. Martini is freely available at http://martini.embl.de.
Defective lamin A-Rb signaling in Hutchinson-Gilford Progeria Syndrome and reversal by farnesyltransferase inhibition.
Marji, J., O'Donoghue, S.I., McClintock, D., Satagopam, V.P., Schneider, R., Ratner, D., J Worman, H., Gordon, L.B. & Djabali, K.
PLoS One. 2010 Jun 15;5(6):e11132.
Hutchinson-Gilford Progeria Syndrome (HGPS) is a rare premature aging disorder caused by a de novo heterozygous point mutation G608G (GGC>GGT) within exon 11 of LMNA gene encoding A-type nuclear lamins. This mutation elicits an internal deletion of 50 amino acids in the carboxyl-terminus of prelamin A. The truncated protein, progerin, retains a farnesylated cysteine at its carboxyl terminus, a modification involved in HGPS pathogenesis. Inhibition of protein farnesylation has been shown to improve abnormal nuclear morphology and phenotype in cellular and animal models of HGPS. We analyzed global gene expression changes in fibroblasts from human subjects with HGPS and found that a lamin A-Rb signaling network is a major defective regulatory axis. Treatment of fibroblasts with a protein farnesyltransferase inhibitor reversed the gene expression defects. Our study identifies Rb as a key factor in HGPS pathogenesis and suggests that its modulation could ameliorate premature aging and possibly complications of physiological aging.
Live coverage of scientific conferences using web technologies.
Lister, A.L., Datta, R.S., Hofmann, O., Krause, R., Kuhn, M., Roth, B. & Schneider, R.
PLoS Comput Biol. 2010 Jan 29;6(1):e1000563. PubMed
LAITOR--Literature Assistant for Identification of Terms co-Occurrences and Relationships.
Barbosa-Silva, A., Soldatos, T.G., Magalhaes, I.L., Pavlopoulos, G.A., Fontaine, J.F., Andrade-Navarro, M.A., Schneider, R. & Ortega, J.M.
BMC Bioinformatics. 2010 Feb 1;11:70.
BACKGROUND: Biological knowledge is represented in scientific literature that often describes the function of genes/proteins (bioentities) in terms of their interactions (biointeractions). Such bioentities are often related to biological concepts of interest that are specific of a determined research field. Therefore, the study of the current literature about a selected topic deposited in public databases, facilitates the generation of novel hypotheses associating a set of bioentities to a common context. RESULTS: We created a text mining system (LAITOR: Literature Assistant for Identification of Terms co-Occurrences and Relationships) that analyses co-occurrences of bioentities, biointeractions, and other biological terms in MEDLINE abstracts. The method accounts for the position of the co-occurring terms within sentences or abstracts. The system detected abstracts mentioning protein-protein interactions in a standard test (BioCreative II IAS test data) with a precision of 0.82-0.89 and a recall of 0.48-0.70. We illustrate the application of LAITOR to the detection of plant response genes in a dataset of 1000 abstracts relevant to the topic. CONCLUSIONS: Text mining tools combining the extraction of interacting bioentities and biological concepts with network displays can be helpful in developing reasonable hypotheses in different scientific backgrounds.
From experimental setup to bioinformatics: An RNAi screening platform to identify host factors involved in HIV-1 replication.
Borner, K., Hermle, J., Sommer, C., Brown, N.P., Knapp, B., Glass, B., Kunkel, J., Torralba, G., Reymann, J., Beil, N., Beneke, J., Pepperkok, R., Schneider, R., Ludwig, T., Hausmann, M., Hamprecht, F., Erfle, H., Kaderali, L., Krausslich, H.G. & Lehmann, M.J.
Biotechnol J. 2009 Dec 10;5(1):39-49.
RNA interference (RNAi) has emerged as a powerful technique for studying loss-of-function phenotypes by specific down-regulation of gene expression, allowing the investigation of virus-host interactions by large-scale high-throughput RNAi screens. Here we present a robust and sensitive small interfering RNA screening platform consisting of an experimental setup, single-cell image and statistical analysis as well as bioinformatics. The workflow has been established to elucidate host gene functions exploited by viruses, monitoring both suppression and enhancement of viral replication simultaneously by fluorescence microscopy. The platform comprises a two-stage procedure in which potential host factors are first identified in a primary screen and afterwards re-tested in a validation screen to confirm true positive hits. Subsequent bioinformatics allows the identification of cellular genes participating in metabolic pathways and cellular networks utilised by viruses for efficient infection. Our workflow has been used to investigate host factor usage by the human immunodeficiency virus-1 (HIV-1), but can also be adapted to other viruses. Importantly, we expect that the description of the platform will guide further screening approaches for virus-host interactions. The ViroQuant-CellNetworks RNAi Screening core facility is an integral part of the recently founded BioQuant centre for systems biology at the University of Heidelberg and will provide service to external users in the near future.
Reflect: augmented browsing for the life scientist.
Pafilis, E., O'Donoghue, S.I., Jensen, L.J., Horn, H., Kuhn, M., Brown, N.P. & Schneider, R.
Nat Biotechnol. 2009 Jun;27(6):508-10. PubMed
GIBA: a clustering tool for detecting protein complexes.
Moschopoulos, C.N., Pavlopoulos, G.A., Schneider, R., Likothanassis, S.D. & Kossida, S.
BMC Bioinformatics. 2009 Jun 16;10 Suppl 6:S11.
BACKGROUND: During the last years, high throughput experimental methods have been developed which generate large datasets of protein - protein interactions (PPIs). However, due to the experimental methodologies these datasets contain errors mainly in terms of false positive data sets and reducing therefore the quality of any derived information. Typically these datasets can be modeled as graphs, where vertices represent proteins and edges the pairwise PPIs, making it easy to apply automated clustering methods to detect protein complexes or other biological significant functional groupings. METHODS: In this paper, a clustering tool, called GIBA (named by the first characters of its developers' nicknames), is presented. GIBA implements a two step procedure to a given dataset of protein-protein interaction data. First, a clustering algorithm is applied to the interaction data, which is then followed by a filtering step to generate the final candidate list of predicted complexes. RESULTS: The efficiency of GIBA is demonstrated through the analysis of 6 different yeast protein interaction datasets in comparison to four other available algorithms. We compared the results of the different methods by applying five different performance measurement metrices. Moreover, the parameters of the methods that constitute the filter have been checked on how they affect the final results. CONCLUSION: GIBA is an effective and easy to use tool for the detection of protein complexes out of experimentally measured protein - protein interaction networks. The results show that GIBA has superior prediction accuracy than previously published methods.
jClust: A clustering and visualization toolbox.
Pavlopoulos, G.A., Moschopoulos, C.N., Hooper, S.D., Schneider, R. & Kossida, S.
Bioinformatics. 2009 May 19.
jClust is a user friendly application which provides access to a set of widely used clustering and clique finding algorithms. The toolbox allows a range of filtering procedures to be applied and is combined with an advanced implementation of the Medusa interactive visualization module. These implemented algorithms are k-Means, Affinity Propagation, Bron-Kerbosch, MULIC, Restricted Neighborhood Search Cluster Algorithm, Markov Clustering and Spectral Clustering while the supported filtering procedures are haircut, outside-inside, best neighbors and density control operations. The combination of a simple input file format, a set of clustering and filtering algorithms linked together with the visualization tool provides a powerful tool for data analysis and information extraction. Availability and supplementary material: http://jclust.embl.de/
New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.
Bromberg, Y., Yachdav, G., Ofran, Y., Schneider, R. & Rost, B.
Curr Opin Drug Discov Devel. 2009 May;12(3):408-19.
The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.
OnTheFly: A Tool for automated document-based text annotation, data linking and network generation.
Pavlopoulos, G.A., Pafilis, E., Kuhn, M., Hooper, S.D. & Schneider, R.
Bioinformatics. 2009 Feb 17.
Arena3D: visualization of biological networks in 3D.
Pavlopoulos, G.A., O'Donoghue, S.I., Satagopam, V.P., Soldatos, T.G., Pafilis, E. & Schneider, R.
BMC Syst Biol. 2008 Nov 28;2(1):104.
ABSTRACT: BACKGROUND: Complexity is a key problem when visualizing biological networks; as the number of entities increases, most graphical views become incomprehensible. Our goal is to enable many thousands of entities to be visualized meaningfully and with high performance. RESULTS: We present a new visualization tool, Arena3D, which introduces a new concept of staggered layers in 3D space. Related data - such as proteins, chemicals, or pathways - can be grouped onto separate layers and arranged via layout algorithms, such as Fruchterman-Reingold, distance geometry, and a novel hierarchical layout. Data on a layer can be clustered via k-means, affinity propagation, Markov clustering, neighbor joining, hierarchical clustering, or UPGMA ('unweighted pair-group method with arithmetic mean'). A simple input format defines the name and URL for each node, and defines connections or similarity scores between pairs of nodes. The use of Arena3D is illustrated with datasets related to Huntington's disease. CONCLUSION: Arena3D is a user friendly visualization tool that is able to visualize biological or any other network in 3D space. It is free for academic use and runs on any platform. It can be downloaded or launched directly from http://arena3d.org. Java3D library and Java 1.5 need to be pre-installed for the software to run.
A survey of visualization tools for biological network analysis.
Pavlopoulos GA, G.ap, Wegener AL, A.w & Schneider, R., R.s
BioData Min. 2008 Nov 28;1(1):12.
SuperTarget and Matador: resources for exploring drug-target relationships.
Gunther, S., Kuhn, M., Dunkel, M., Campillos, M., Senger, C., Petsalaki, E., Ahmed, J., Urdiales, E.G., Gewiess, A., Jensen, L.J., Schneider, R., Skoblo, R., Russell, R.B., Bourne, P.E., Bork, P. & Preissner, R.
Nucleic Acids Res. 2008 Jan;36(Database issue):D919-22. Epub 2007 Oct 16.
The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that integrates drug-related information about medical indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of the target proteins. An easy-to-use query interface enables the user to pose complex queries, for example to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target the same protein but are metabolized by different enzymes. Furthermore, we provide tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of these drugs has been annotated with additional binding information and indirect interactions and is available as a separate resource called Matador. SuperTarget and Matador are available at http://insilico.charite.de/supertarget and http://matador.embl.de.
Clustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence.
Barbosa-Silva, A., Satagopam, V.P., Schneider, R. & Ortega, J.M.
BMC BIOINFORMATICS 2008 9, 141
Development of SRS.php, a Simple Object Access Protocol-based library for data acquisition from integrated biological databases.
Barbosa-Silva, A., Pafilis, E., Ortega, J.M. & Schneider, R.
Genet Mol Res. 2007 Dec 11;6(4):1142-50.
Data integration has become an important task for biological database providers. The current model for data exchange among different sources simplifies the manner that distinct information is accessed by users. The evolution of data representation from HTML to XML enabled programs, instead of humans, to interact with biological databases. We present here SRS.php, a PHP library that can interact with the data integration Sequence Retrieval System (SRS). The library has been written using SOAP definitions, and permits the programmatic communication through webservices with the SRS. The interactions are possible by invoking the methods described in WSDL by exchanging XML messages. The current functions available in the library have been built to access specific data stored in any of the 90 different databases (such as UNIPROT, KEGG and GO) using the same query syntax format. The inclusion of the described functions in the source of scripts written in PHP enables them as webservice clients to the SRS server. The functions permit one to query the whole content of any SRS database, to list specific records in these databases, to get specific fields from the records, and to link any record among any pair of linked databases. The case study presented exemplifies the library usage to retrieve information regarding registries of a Plant Defense Mechanisms database. The Plant Defense Mechanisms database is currently being developed, and the proposal of SRS.php library usage is to enable the data acquisition for the further warehousing tasks related to its setup and maintenance.
Status of text-mining techniques applied to biomedical text.
Erhardt, R.A., Schneider, R. & Blaschke, C.
Drug Discov Today. 2006 Apr;11(7-8):315-25.
Scientific progress is increasingly based on knowledge and information. Knowledge is now recognized as the driver of productivity and economic growth, leading to a new focus on the role of information in the decision-making process. Most scientific knowledge is registered in publications and other unstructured representations that make it difficult to use and to integrate the information with other sources (e.g. biological databases). Making a computer understand human language has proven to be a complex achievement, but there are techniques capable of detecting, distinguishing and extracting a limited number of different classes of facts. In the biomedical field, extracting information has specific problems: complex and ever-changing nomenclature (especially genes and proteins) and the limited representation of domain knowledge.
Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery.
Ofran, Y., Punta, M., Schneider, R. & Rost, B.
Drug Discov Today 2005 Nov 1;10(21):1475-82.
Every entirely sequenced genome reveals 100 s to 1000 s of protein sequences for which the only annotation available is 'hypothetical protein'. Thus, in the human genome and in the genomes of pathogenic agents there could be 1000 s of potential, unexplored drug targets. Computational prediction of protein function can play a role in studying these targets. We shall review the challenges, research approaches and recently developed tools in the field of computational function-prediction and we will discuss the ways these issues can change the process of drug discovery.
A bioinformatics perspective on proteomics: data storage, analysis, and integration.
Kremer, A., Schneider, R. & Terstappen, G.C.
Biosci Rep 2005 Feb-Apr;25(1-2):95-106.
The field of proteomics is advancing rapidly as a result of powerful new technologies and proteomics experiments yield a vast and increasing amount of information. Data regarding protein occurrence, abundance, identity, sequence, structure, properties, and interactions need to be stored. Currently, a common standard has not yet been established and open access to results is needed for further development of robust analysis algorithms. Databases for proteomics will evolve from pure storage into knowledge resources, providing a repository for information (meta-data) which is mainly not stored in simple flat files. This review will shed light on recent steps towards the generation of a common standard in proteomics data storage and integration, but is not meant to be a comprehensive overview of all available databases and tools in the proteomics community.
Ensemble and UniProt (Swiss-Prot).
Jackson, D., & Schneider, R.
In "Genetics, Genomics, Proteomics and Bioinformatics", Jorde L.B., Little, P. Dunn, M., & Subramaniam (Eds.), John Wiley & Sons, UK
Improving research productivity at a pharmaceutical company.
Ramakrishnan, S., Caruso, A. & Schneider, R.
LION bioscience, White paper, 2002.