Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing.
Zichner, T., Garfield, D.A., Rausch, T., Stutz, A.M., Cannavo, E., Braun, M., Furlong, E.E. & Korbel, J.O.
Genome Res. 2013 Mar;23(3):568-79. doi: 10.1101/gr.142646.112. Epub 2012 Dec 6.
Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs comprising 8962 deletions and 916 tandem duplications in 39 lines derived from short-read DNA sequencing in a natural population (the "Drosophila melanogaster Genetic Reference Panel," DGRP). Most SVs (>90%) were inferred at nucleotide resolution, and a large fraction was genotyped across all samples. Comprehensive analyses of SV formation mechanisms using the short-read data revealed an abundance of SVs formed by mobile element and nonhomologous end-joining-mediated rearrangements, and clustering of variants into SV hotspots. We further observed a strong depletion of SVs overlapping genes, which, along with population genetics analyses, suggests that these SVs are often deleterious. We inferred several gene fusion events also highlighting the potential role of SVs in the generation of novel protein products. Expression quantitative trait locus (eQTL) mapping revealed the functional impact of our high-resolution SV map, with quantifiable effects at >100 genic loci. Our map represents a resource for population-level studies of SVs in an important model organism.
Logical modelling of Drosophila signalling pathways.
Mbodj, A., Junion, G., Brun, C., Furlong, E.E. & Thieffry, D.
Mol Biosyst. 2013 Jul 30;9(9):2248-58. doi: 10.1039/c3mb70187e.
A limited number of signalling pathways are involved in the specification of cell fate during the development of all animals. Several of these pathways were originally identified in Drosophila. To clarify their roles, and possible cross-talk, we have built a logical model for the nine key signalling pathways recurrently used in metazoan development. In each case, we considered the associated ligands, receptors, signal transducers, modulators, and transcription factors reported in the literature. Implemented using the logical modelling software GINsim, the resulting models qualitatively recapitulate the main characteristics of each pathway, in wild type as well as in various mutant situations (e.g. loss-of-function or gain-of-function). These models constitute pluggable modules that can be used to assemble comprehensive models of complex developmental processes. Moreover, these models of Drosophila pathways could serve as scaffolds for more complicated models of orthologous mammalian pathways. Comprehensive model annotations and GINsim files are provided for each of the nine considered pathways.
Predicting spatial and temporal gene expression using an integrative model of transcription factor occupancy and chromatin state.
Wilczynski, B., Liu, Y.H., Yeo, Z.X. & Furlong, E.E.
PLoS Comput Biol. 2012 Dec;8(12):e1002798. doi: 10.1371/journal.pcbi.1002798.Epub 2012 Dec 6.
Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naive model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal predictions and 50% for spatial. While this is, to our knowledge, the first genome-wide approach to predict tissue-specific gene expression in metazoan development, our results suggest that integrative models of this type will become more prevalent in the future.
Fragmentation of DNA in a sub-microliter microfluidic sonication device.
Tseng, Q., Lomonosov, A.M., Furlong, E.E. & Merten, C.A.
Lab Chip. 2012 Nov 21;12(22):4677-82. doi: 10.1039/c2lc40595d.
Fragmentation of DNA is an essential step for many biological applications including the preparation of next-generation sequencing (NGS) libraries. As sequencing technologies push the limits towards single cell and single molecule resolution, it is of great interest to reduce the scale of this upstream fragmentation step. Here we describe a miniaturized DNA shearing device capable of processing sub-microliter samples based on acoustic shearing within a microfluidic chip. A strong acoustic field was generated by a Langevin-type piezo transducer and coupled into the microfluidic channel via the flexural lamb wave mode. Purified genomic DNA, as well as covalently cross-linked chromatin were sheared into various fragment sizes ranging from approximately 180 bp to 4 kb. With the use of standard PDMS soft lithography, our approach should facilitate the integration of additional microfluidic modules and ultimately allow miniaturized NGS workflows.
Analysis of variation at transcription factor binding sites in Drosophila and humans.
Spivakov, M., Akhtar, J., Kheradpour, P., Beal, K., Girardot, C., Koscielny, G., Herrero, J., Kellis, M., Furlong, E.E. & Birney, E.
Genome Biol. 2012 Sep 28;13(9):R49. doi: 10.1186/gb-2012-13-9-r49.
BACKGROUND: Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines. RESULTS: We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding. CONCLUSIONS: Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Transcription factors: from enhancer binding to developmental control.
Spitz, F. & Furlong, E.E.
Nat Rev Genet. 2012 Sep;13(9):613-26. doi: 10.1038/nrg3207. Epub 2012 Aug 7.
Developmental progression is driven by specific spatiotemporal domains of gene expression, which give rise to stereotypically patterned embryos even in the presence of environmental and genetic variation. Views of how transcription factors regulate gene expression are changing owing to recent genome-wide studies of transcription factor binding and RNA expression. Such studies reveal patterns that, at first glance, seem to contrast with the robustness of the developmental processes they encode. Here, we review our current knowledge of transcription factor function from genomic and genetic studies and discuss how different strategies, including extensive cooperative regulation (both direct and indirect), progressive priming of regulatory elements, and the integration of activities from multiple enhancers, confer specificity and robustness to transcriptional regulation during development.
Cell type-specific chromatin immunoprecipitation from multicellular complex samples using BiTS-ChIP.
Bonn, S., Zinzen, R.P., Perez-Gonzalez, A., Riddell, A., Gavin, A.C. & Furlong, E.E.
Nat Protoc. 2012 Apr 26;7(5):978-94. doi: 10.1038/nprot.2012.049.
This protocol describes the batch isolation of tissue-specific chromatin for immunoprecipitation (BiTS-ChIP) for analysis of histone modifications, transcription factor binding, or polymerase occupancy within the context of a multicellular organism or tissue. Embryos expressing a cell type-specific nuclear marker are formaldehyde cross-linked and then subjected to dissociation. Fixed nuclei are isolated and sorted using FACS on the basis of the cell type-specific nuclear marker. Tissue-specific chromatin is extracted, sheared by sonication and used for ChIP-seq or other analyses. The key advantages of this method are the covalent cross-linking before embryo dissociation, which preserves the transcriptional context, and the use of FACS of nuclei, yielding very high purity. The protocol has been optimized for Drosophila, but with minor modifications should be applicable to any model system. The full protocol, including sorting, immunoprecipitation and generation of sequencing libraries, can be completed within 5 d.
Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development.
Bonn, S., Zinzen, R.P., Girardot, C., Gustafson, E.H., Perez-Gonzalez, A., Delhomme, N., Ghavi-Helm, Y., Wilczynski, B., Riddell, A. & Furlong, E.E.
Nat Genet. 2012 Jan 8;44(2):148-56. doi: 10.1038/ng.1064.
Chromatin modifications are associated with many aspects of gene expression, yet their role in cellular transitions during development remains elusive. Here, we use a new approach to obtain cell type-specific information on chromatin state and RNA polymerase II (Pol II) occupancy within the multicellular Drosophila melanogaster embryo. We directly assessed the relationship between chromatin modifications and the spatio-temporal activity of enhancers. Rather than having a unique chromatin state, active developmental enhancers show heterogeneous histone modifications and Pol II occupancy. Despite this complexity, combined chromatin signatures and Pol II presence are sufficient to predict enhancer activity de novo. Pol II recruitment is highly predictive of the timing of enhancer activity and seems dependent on the timing and location of transcription factor binding. Chromatin modifications typically demarcate large regulatory regions encompassing multiple enhancers, whereas local changes in nucleosome positioning and Pol II occupancy delineate single active enhancers. This cell type-specific view identifies dynamic enhancer usage, an essential step in deciphering developmental networks.
A transcription factor collective defines cardiac cell fate and reflects lineage history.
Junion, G.*, Spivakov, M.*, Girardot, C., Braun, M., Gustafson, E.H., Birney, E. & Furlong, E.E.
Cell. 2012 Feb 3;148(3):473-86.
Cell fate decisions are driven through the integration of inductive signals and tissue-specific transcription factors (TFs), although the details on how this information converges in cis remain unclear. Here, we demonstrate that the five genetic components essential for cardiac specification in Drosophila, including the effectors of Wg and Dpp signaling, act as a collective unit to cooperatively regulate heart enhancer activity, both in vivo and in vitro. Their combinatorial binding does not require any specific motif orientation or spacing, suggesting an alternative mode of enhancer function whereby cooperative activity occurs with extensive motif flexibility. A fraction of enhancers co-occupied by cardiogenic TFs had unexpected activity in the neighboring visceral mesoderm but could be rendered active in heart through single-site mutations. Given that cardiac and visceral cells are both derived from the dorsal mesoderm, this "dormant" TF binding signature may represent a molecular footprint of these cells' developmental lineage.
Analyzing transcription factor occupancy during embryo development using ChIP-seq.
Ghavi-Helm, Y. & Furlong, E.E.
Methods Mol Biol. 2012;786:229-45.
Accurately assessing the binding of transcription factors to cis-regulatory elements in vivo is an essential step toward understanding the mechanisms that govern embryonic development. Genome-wide transcription factor location analysis has been facilitated by the development of high-density tiling arrays (ChIP-on-chip), and more recently by next-generation sequencing technologies, which are used to sequence the DNA fragments obtained from chromatin immunoprecipitation experiments (ChIP-seq). This chapter provides a detailed protocol of the different steps required to generate a successful ChIP-seq library, starting from embryo collection and fixation to chromatin preparation, immunoprecipitation, and finally library preparation. The protocol is optimized for Drosophila embryos, but can be adapted to any organism. The obtained library is suitable for sequencing on an Illumina GAIIx platform.
Molecular biology: A fly in the face of genomics.
Nature. 2011 Mar 24;471(7339):458-9. Europe PMC
The importance of being specified: cell fate decisions and their role in cell biology.
Mol Biol Cell. 2010 Nov 15;21(22):3797-8. Europe PMC
Dynamic CRM occupancy reflects a temporal map of developmental progression.
Wilczynski, B. & Furlong, E.E.
Mol Syst Biol. 2010 Jun 22;6:383.
Development is driven by tightly coordinated spatio-temporal patterns of gene expression, which are initiated through the action of transcription factors (TFs) binding to cis-regulatory modules (CRMs). Although many studies have investigated how spatial patterns arise, precise temporal control of gene expression is less well understood. Here, we show that dynamic changes in the timing of CRM occupancy is a prevalent feature common to all TFs examined in a developmental ChIP time course to date. CRMs exhibit complex binding patterns that cannot be explained by the sequence motifs or expression of the TFs themselves. The temporal changes in TF binding are highly correlated with dynamic patterns of target gene expression, which in turn reflect transitions in cellular function during different stages of development. Thus, it is not only the timing of a TF's expression, but also its temporal occupancy in refined time windows, which determines temporal gene expression. Systematic measurement of dynamic CRM occupancy may therefore serve as a powerful method to decode dynamic changes in gene expression driving developmental progression.
Model-based method for transcription factor target identification with limited data.
Honkela, A., Girardot, C., Gustafson, E.H., Liu, Y.H., Furlong, E.E., Lawrence, N.D. & Rattray, M.
Proc Natl Acad Sci U S A. 2010 Apr 27;107(17):7793-8. Epub 2010 Apr 12.
We present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.
Combinatorial binding leads to diverse regulatory responses: Lmd is a tissue-specific modulator of Mef2 activity.
Cunha, P.M.*, Sandmann, T.*, Gustafson, E.H., Ciglar, L., Eichenlaub, M.P. & Furlong, E.E.
PLoS Genet. 2010 Jul 1;6:e1001014.
Understanding how complex patterns of temporal and spatial expression are regulated is central to deciphering genetic programs that drive development. Gene expression is initiated through the action of transcription factors and their cofactors converging on enhancer elements leading to a defined activity. Specific constellations of combinatorial occupancy are therefore often conceptualized as rigid binding codes that give rise to a common output of spatio-temporal expression. Here, we assessed this assumption using the regulatory input of two essential transcription factors within the Drosophila myogenic network. Mutations in either Myocyte enhancing factor 2 (Mef2) or the zinc-finger transcription factor lame duck (lmd) lead to very similar defects in myoblast fusion, yet the underlying molecular mechanism for this shared phenotype is not understood. Using a combination of ChIP-on-chip analysis and expression profiling of loss-of-function mutants, we obtained a global view of the regulatory input of both factors during development. The majority of Lmd-bound enhancers are co-bound by Mef2, representing a subset of Mef2's transcriptional input during these stages of development. Systematic analyses of the regulatory contribution of both factors demonstrate diverse regulatory roles, despite their co-occupancy of shared enhancer elements. These results indicate that Lmd is a tissue-specific modulator of Mef2 activity, acting as both a transcriptional activator and repressor, which has important implications for myogenesis. More generally, this study demonstrates considerable flexibility in the regulatory output of two factors, leading to additive, cooperative, and repressive modes of co-regulation.
Conservation and divergence in developmental networks: a view from Drosophila myogenesis.
Ciglar, L. & Furlong, E.E.
Curr Opin Cell Biol. 2009 Dec;21(6):754-60. Epub 2009 Nov 4.
Understanding developmental networks has recently been enhanced through the identification of a large number of conserved essential regulators. Interspecies comparisons of the transcriptional networks regulated by these factors are still at a rather early stage, with limited global data available. Here we use the accumulating phenotypic information from multiple species to provide initial insights into the wiring and rewiring of developmental networks, with particular emphasis on myogenesis, a highly conserved developmental process. This review highlights the most recent findings on the transcriptional program driving Drosophila myogenesis and compares this with vertebrates, revealing emerging themes that may be applicable to other developmental contexts.
Combinatorial binding predicts spatio-temporal cis-regulatory activity.
Zinzen, R.P., Girardot, C., Gagneur, J., Braun, M. & Furlong, E.E.
Nature. 2009 Nov 5;462(7269):65-70.
Development requires the establishment of precise patterns of gene expression, which are primarily controlled by transcription factors binding to cis-regulatory modules. Although transcription factor occupancy can now be identified at genome-wide scales, decoding this regulatory landscape remains a daunting challenge. Here we used a novel approach to predict spatio-temporal cis-regulatory activity based only on in vivo transcription factor binding and enhancer activity data. We generated a high-resolution atlas of cis-regulatory modules describing their temporal and combinatorial occupancy during Drosophila mesoderm development. The binding profiles of cis-regulatory modules with characterized expression were used to train support vector machines to predict five spatio-temporal expression patterns. In vivo transgenic reporter assays demonstrate the high accuracy of these predictions and reveal an unanticipated plasticity in transcription factor binding leading to similar expression. This data-driven approach does not require previous knowledge of transcription factor sequence affinity, function or expression, making it widely applicable.
Challenges for modeling global gene regulatory networks during development: Insights from Drosophila.
Wilczynski, B. & Furlong, E.E.
Dev Biol. 2009 Oct 27.
Development is regulated by dynamic patterns of gene expression, which are orchestrated through the action of complex gene regulatory networks (GRNs). Substantial progress has been made in modeling transcriptional regulation in recent years, including qualitative "coarse-grain" models operating at the gene level to very "fine-grain" quantitative models operating at the biophysical "transcription factor-DNA level". Recent advances in genome-wide studies have revealed an enormous increase in the size and complexity or GRNs. Even relatively simple developmental processes can involve hundreds of regulatory molecules, with extensive interconnectivity and cooperative regulation. This leads to an explosion in the number of regulatory functions, effectively impeding Boolean-based qualitative modeling approaches. At the same time, the lack of information on the biophysical properties for the majority of transcription factors within a global network restricts quantitative approaches. In this review, we explore the current challenges in moving from modeling medium scale well-characterized networks to more poorly characterized global networks. We suggest to integrate coarse- and find-grain approaches to model gene regulatory networks in cis. We focus on two very well-studied examples from Drosophila, which likely represent typical developmental regulatory modules across metazoans.
A systematic analysis of Tinman function reveals Eya and JAK-STAT signaling as essential regulators of muscle development.
Liu, Y.H., Jakobsen, J.S., Valentin, G., Amarantos, I., Gilmour, D.T. & Furlong, E.E.
Dev Cell. 2009 Feb;16(2):280-91.
Nk-2 proteins are essential developmental regulators from flies to humans. In Drosophila, the family member tinman is the major regulator of cell fate within the dorsal mesoderm, including heart, visceral, and dorsal somatic muscle. To decipher Tinman's direct regulatory role, we performed a time course of ChIP-on-chip experiments, revealing a more prominent role in somatic muscle specification than previously anticipated. Through the combination of transgenic enhancer-reporter assays, colocalization studies, and phenotypic analyses, we uncovered two additional factors within this myogenic network: by activating eyes absent, Tinman's regulatory network extends beyond developmental stages and tissues where it is expressed; by regulating stat92E expression, Tinman modulates the transcriptional readout of JAK/STAT signaling. We show that this pathway is essential for somatic muscle development in Drosophila and for myotome morphogenesis in zebrafish. Taken together, these data uncover a conserved requirement for JAK/STAT signaling and an important component of the transcriptional network driving myogenesis.
Dynamic regulation by polycomb group protein complexes controls pattern formation and the cell cycle in Drosophila.
Oktaba, K., Gutierrez, L., Gagneur, J., Girardot, C., Sengupta, A.K., Furlong, E.E. & Muller, J.
Dev Cell. 2008 Dec;15(6):877-89. Epub 2008 Nov 6.
Polycomb group (PcG) proteins form conserved regulatory complexes that modify chromatin to repress transcription. Here, we report genome-wide binding profiles of PhoRC, the Drosophila PcG protein complex containing the DNA-binding factor Pho/dYY1 and dSfmbt. PhoRC constitutively occupies short Polycomb response elements (PREs) of a large set of developmental regulator genes in both embryos and larvae. The majority of these PREs are co-occupied by the PcG complexes PRC1 and PRC2. Analysis of PcG mutants shows that the PcG system represses genes required for anteroposterior, dorsoventral, and proximodistal patterning of imaginal discs and that it also represses cell cycle regulator genes. Many of these genes are regulated in a dynamic manner, and our results suggest that the PcG system restricts signaling-mediated activation of target genes to appropriate cells. Analysis of cell cycle regulators indicates that the PcG system also dynamically modulates the expression levels of certain genes, providing a possible explanation for the tumor phenotype of PcG mutants.
cis-Regulatory networks during development: a view of Drosophila.
Bonn, S. & Furlong, E.E.
Curr Opin Genet Dev. 2008 Dec;18(6):513-20. Epub 2008 Oct 16.
Understanding how regulatory networks initiate, maintain and synchronise transcriptional states remains a fundamental goal of developmental biology. Complex patterns of spatio-temporal gene expression are generated through the combined inputs of signalling and transcriptional networks converging on cis-regulatory modules (CRMs). Detailed studies in Drosophila, using transgenic reporter assays and mutagenesis analysis, have dissected the regulatory logic of a number of CRMs. These data have recently been complemented by genome-wide maps of transcription factor binding, revealing an unprecedented view of CRM occupancy and network complexity. The synthesis of data for three well-characterised Drosophila developmental networks reveals emerging themes at both a CRM and a cis-regulatory network level.
Divergence in cis-regulatory networks: taking the 'species' out of cross-species analysis.
Zinzen, R.P. & Furlong, E.E.
Genome Biol. 2008 Nov 4;9(11):240.
ABSTRACT: Many essential transcription factors have conserved roles in regulating biological programs, yet their genomic occupancy can diverge significantly. A new study demonstrates that such variations are primarily due to cis-regulatory sequences, rather than differences between the regulators or nuclear environments.
A topographical map of spatiotemporal patterns of gene expression.
Dev Cell. 2008 May;14(5):639-40.
A recent study by Folkes et al. in Cell generated a 3D atlas of gene expression for the Drosophila blastoderm embryo using a new approach for image registration. This virtual embryo allows in silico multiplexing of in situ hybridizations and lays the groundwork for new insights into gene regulatory networks.
4DXpress: a database for cross-species expression pattern comparisons.
Haudry, Y., Berube, H., Letunic, I., Weeber, P.D., Gagneur, J., Girardot, C., Kapushesky, M., Arendt, D., Bork, P., Brazma, A., Furlong, E.E., Wittbrodt, J. & Henrich, T.
Nucleic Acids Res. 2008 Jan;36(Database issue):D847-53. Epub 2007 Oct 4.
In the major animal model species like mouse, fish or fly, detailed spatial information on gene expression over time can be acquired through whole mount in situ hybridization experiments. In these species, expression patterns of many genes have been studied and data has been integrated into dedicated model organism databases like ZFIN for zebrafish, MEPD for medaka, BDGP for Drosophila or GXD for mouse. However, a central repository that allows users to query and compare gene expression patterns across different species has not yet been established. Therefore, we have integrated expression patterns for zebrafish, Drosophila, medaka and mouse into a central public repository called 4DXpress (expression database in four dimensions). Users can query anatomy ontology-based expression annotations across species and quickly jump from one gene to the orthologues in other species. Genes are linked to public microarray data in ArrayExpress. We have mapped developmental stages between the species to be able to compare developmental time phases. We store the largest collection of gene expression patterns available to date in an individual resource, reflecting 16 505 annotated genes. 4DXpress will be an invaluable tool for developmental as well as for computational biologists interested in gene regulation and evolution. 4DXpress is available at http://ani.embl.de/4DXpress.
Enhanced function annotations for Drosophila serine proteases: a case study for systematic annotation of multi-member gene families.
Shah, P.K., Tripathi, L.P., Jensen, L.J., Gahnim, M., Mason, C., Furlong, E.E., Rodrigues, V., White, K.P., Bork, P. & Sowdhamini, R.
Gene. 2008 Jan 15;407(1-2):199-215. Epub 2007 Oct 15.
Systematically annotating function of enzymes that belong to large protein families encoded in a single eukaryotic genome is a very challenging task. We carried out such an exercise to annotate function for serine-protease family of the trypsin fold in Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to provide function annotations for 190 Drosophila gene products containing serine-protease-like domains, of which 35 are SPHs. This was accomplished by analysis of structure-function relationships, gene-expression profiles, large-scale protein-protein interaction data, literature mining and bioinformatic tools. We introduce functional residue clustering (FRC), a method that performs hierarchical clustering of sequences using properties of functionally important residues and utilizes correlation co-efficient as a quantitative similarity measure to transfer in vivo substrate specificities to proteases. We show that the efficiency of transfer of substrate-specificity information using this method is generally high. FRC was also applied on Drosophila proteases to assign putative competitive inhibitor relationships (CIRs). Microarray gene-expression data were utilized to uncover a large-scale and dual involvement of proteases in development and in immune response. We found specific recruitment of SPHs and proteases with CLIP domains in immune response, suggesting evolution of a new function for SPHs. We also suggest existence of separate downstream protease cascades for immune response against bacterial/fungal infections and parasite/parasitoid infections. We verify quality of our annotations using information from RNAi screens and other evidence types. Utilization of such multi-fold approaches results in 10-fold increase of function annotation for Drosophila serine proteases and demonstrates value in increasing annotations in multiple genomes.
Temporal ChIP-on-chip reveals Biniou as a universal regulator of the visceral muscle transcriptional network.
Jakobsen, J.S., Braun, M., Astorga, J., Gustafson, E.H., Sandmann, T., Karzynski, M., Carlsson, P. & Furlong, E.E.
Genes Dev. 2007 Oct 1;21(19):2448-60.
Smooth muscle plays a prominent role in many fundamental processes and diseases, yet our understanding of the transcriptional network regulating its development is very limited. The FoxF transcription factors are essential for visceral smooth muscle development in diverse species, although their direct regulatory role remains elusive. We present a transcriptional map of Biniou (a FoxF transcription factor) and Bagpipe (an Nkx factor) activity, as a first step to deciphering the developmental program regulating Drosophila visceral muscle development. A time course of chromatin immunoprecipitatation followed by microarray analysis (ChIP-on-chip) experiments and expression profiling of mutant embryos reveal a dynamic map of in vivo bound enhancers and direct target genes. While Biniou is broadly expressed, it regulates enhancers driving temporally and spatially restricted expression. In vivo reporter assays indicate that the timing of Biniou binding is a key trigger for the time span of enhancer activity. Although bagpipe and biniou mutants phenocopy each other, their regulatory potential is quite different. This network architecture was not apparent from genetic studies, and highlights Biniou as a universal regulator in all visceral muscle, regardless of its developmental origin or subsequent function. The regulatory connection of a number of Biniou target genes is conserved in mice, suggesting an ancient wiring of this developmental program.
CoCo: a web application to display, store and curate ChIP-on-chip data integrated with diverse types of gene expression data.
Girardot, C., Sklyar, O., Grosz, S., Huber, W. & Furlong, E.E.
Bioinformatics. 2007 Mar 15;23(6):771-3. Epub 2007 Jan 17.
MOTIVATION: CoCo, ChIP-on-Chip online, is an open-source web application that supports the annotation and curation of regulatory regions and associated target genes discovered in ChIP-on-chip experiments. CoCo integrates ChIP-on-chip results with diverse types of gene expression data (expression profiling, in situ hybridization) and displays them within a genomic context. Regulatory relationships between the transcription factor-bound regions and putative target genes can be stored and expanded throughout different sessions. AVAILABILITY: http://furlonglab.embl.de/methods/tools/coco.
A core transcriptional network for early mesoderm development in Drosophila melanogaster.
Sandmann, T., Girardot, C., Brehme, M., Tongprasit, W., Stolc, V. & Furlong, E.E.
Genes Dev. 2007 Feb 15;21(4):436-49.
Embryogenesis is controlled by large gene-regulatory networks, which generate spatially and temporally refined patterns of gene expression. Here, we report the characteristics of the regulatory network orchestrating early mesodermal development in the fruitfly Drosophila, where the transcription factor Twist is both necessary and sufficient to drive development. Through the integration of chromatin immunoprecipitation followed by microarray analysis (ChIP-on-chip) experiments during discrete time periods with computational approaches, we identified >2000 Twist-bound cis-regulatory modules (CRMs) and almost 500 direct target genes. Unexpectedly, Twist regulates an almost complete cassette of genes required for cell proliferation in addition to genes essential for morophogenesis and cell migration. Twist targets almost 25% of all annotated Drosophila transcription factors, which may represent the entire set of regulators necessary for the early development of this system. By combining in vivo binding data from Twist, Mef2, Tinman, and Dorsal we have constructed an initial transcriptional network of early mesoderm development. The network topology reveals extensive combinatorial binding, feed-forward regulation, and complex logical outputs as prevalent features. In addition to binary activation and repression, we suggest that Twist binds to almost all mesodermal CRMs to provide the competence to integrate inputs from more specialized transcription factors.
Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis.
Hooper, S.D., Boue, S., Krause, R., Jensen, L.J., Mason, C.E., Ghanim, M., White, K.P., Furlong, E.E. & Bork, P.
Mol Syst Biol. 2007;3:72. Epub 2007 Jan 16.
Time-series analysis of whole-genome expression data during Drosophila melanogaster development indicates that up to 86% of its genes change their relative transcript level during embryogenesis. By applying conservative filtering criteria and requiring 'sharp' transcript changes, we identified 1534 maternal genes, 792 transient zygotic genes, and 1053 genes whose transcript levels increase during embryogenesis. Each of these three categories is dominated by groups of genes where all transcript levels increase and/or decrease at similar times, suggesting a common mode of regulation. For example, 34% of the transiently expressed genes fall into three groups, with increased transcript levels between 2.5-12, 11-20, and 15-20 h of development, respectively. We highlight common and distinctive functional features of these expression groups and identify a coupling between downregulation of transcript levels and targeted protein degradation. By mapping the groups to the protein network, we also predict and experimentally confirm new functional associations.
Mes2, a MADF-containing transcription factor essential for Drosophila development.
Zimmermann, G., Furlong, E.E., Suyama, K. & Scott, M.P.
Dev Dyn. 2006 Dec;235(12):3387-95.
The development of the Drosophila mesoderm is initiated by the basic helix-loop-helix transcription factor twist. We identified a gene encoding a putative transcription factor, mes2, in a screen for essential mesoderm-expressed genes that function downstream of twist. Mes2 protein belongs to a family of 48 Drosophila proteins containing MADF domains. MADF domains exist in worms, flies, and fish. Mes2 is a nuclear protein first produced in trunk and head mesoderm during late gastrulation. At later embryonic stages, mes2 is expressed in glia of the central and peripheral nervous systems, and in tissues derived from the head mesoderm. We have identified a null mutation of mes2 that leads to developmental arrest in first instar larvae. Increased production of Mes2 in multiple embryonic and larval tissues almost always causes lethality. The ubiquitous or epidermal misexpression of mes2 in the embryo causes a dramatic loss of epidermal integrity resulting in the failure of dorsal closure. Our data show that the precise regulation of mes2 expression is critical for normal development in Drosophila and implicate Mes2 in the regulation of essential target genes.
Genomics and development: Taking developmental biology to new heights.
Spitz, F. & Furlong, E.E.
Dev Cell. 2006 Oct;11(4):451-7.
The 2006 Arolla meeting brought together scientists from around the globe to discuss how genomic scale analyses can enhance progress in understanding developmental biology.
A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development.
Sandmann, T., Jensen, L.J., Jakobsen, J.S., Karzynski, M.M., Eichenlaub, M.P., Bork, P. & Furlong, E.E.
Dev Cell. 2006 Jun;10(6):797-807.
Dissecting components of key transcriptional networks is essential for understanding complex developmental processes and phenotypes. Genetic studies have highlighted the role of members of the Mef2 family of transcription factors as essential regulators in myogenesis from flies to man. To understand how these transcription factors control diverse processes in muscle development, we have combined chromatin immunoprecipitation analysis with gene expression profiling to obtain a temporal map of Mef2 activity during Drosophila embryonic development. This global approach revealed three temporal patterns of Mef2 enhancer binding, providing a glimpse of dynamic enhancer use within the context of a developing embryo. Our results provide mechanistic insight into the regulation of Mef2's activity at the level of DNA binding and suggest cooperativity with the bHLH protein Twist. The number and diversity of new direct target genes indicates a much broader role for Mef2, at all stages of myogenesis, than previously anticipated.
Developmental control of nuclear size and shape by Kugelkern and Kurzkern.
Brandt, A., Papagiannouli, F., Wagner, N., Wilsch-Brauninger, M., Braun, M., Furlong, E.E., Loserth, S., Wenzl, C., Pilot, F., Vogt, N., Lecuit, T., Krohne, G. & Grosshans, J.
Curr Biol. 2006 Mar 21;16(6):543-52. Epub 2006 Feb 2.
BACKGROUND: The shape of a nucleus depends on the nuclear lamina, which is tightly associated with the inner nuclear membrane and on the interaction with the cytoskeleton. However, the mechanism connecting the differentiation state of a cell to the shape changes of its nucleus are not well understood. We investigated this question in early Drosophila embryos, where the nuclear shape changes from spherical to ellipsoidal together with a 2.5-fold increase in nuclear length during cellularization. RESULTS: We identified two genes, kugelkern and kurzkern, required for nuclear elongation. In kugelkern- and kurzkern-depleted embryos, the nuclei reach only half the length of the wild-type nuclei at the end of cellularization. The reduced nuclear size affects chromocenter formation as marked by Heterochromatin protein 1 and expression of a specific set of genes, including early zygotic genes. kugelkern contains a putative coiled-coil domain in the N-terminal half of the protein, a nuclear localization signal (NLS), and a C-terminal CxxM-motif. The carboxyterminal CxxM motif is required for the targeting of Kugelkern to the inner nuclear membrane, where it colocalizes with lamins. Depending on the farnesylation motif, expression of kugelkern in Drosophila embryos or Xenopus cells induces overproliferation of nuclear membrane. CONCLUSIONS: Kugelkern is so far the first nuclear protein, except for lamins, that contains a farnesylation site. Our findings suggest that Kugelkern is a rate-determining factor for nuclear size increase. We propose that association of farnesylated Kugelkern with the inner nuclear membrane induces expansion of nuclear surface area, allowing nuclear growth.
ChIP-on-chip protocol for genome-wide analysis of transcription factor binding in Drosophila melanogaster embryos.
Sandmann, T., Jakobsen, J.S. & Furlong, E.E.
Nat Protoc. 2006;1(6):2839-55.
This protocol describes a method to detect in vivo associations between proteins and DNA in developing Drosophila embryos. It combines formaldehyde crosslinking and immunoprecipitation of protein-bound sequences with genome-wide analysis using microarrays. After crosslinking, nuclei are enriched using differential centrifugation and the chromatin is sheared by sonication. Antibodies specifically recognizing wild-type protein or, alternatively, a genetically encoded epitope tag are used to enrich for specifically bound DNA sequences. After purification and polymerase chain reaction-based amplification, the samples are fluorescently labeled and hybridized to genomic tiling microarrays. This protocol has been successfully used to study different tissue-specific transcription factors, and is generally applicable to in vivo analysis of any DNA-binding proteins in Drosophila embryos. The full protocol, including the collection of embryos and the collection of raw microarray data, can be completed within 10 days.
A functional genomics approach to identify new regulators of Wnt signaling.
Dev Cell 2005 May;8(5):624-6.
A recent study by used a genome-wide RNAi screen in Drosophila cells to identify 238 candidate regulators of the Wnt-signaling pathway, most of which had not been previously connected to Wnt signaling. Supporting in vivo studies are in progress. The fact that such an impressive number of potential modulators had eluded detection in genetic screens underscores the potential of applying new, high-throughput approaches to old problems.
Myofilin, a protein in the thick filaments of insect muscle.
Qiu, F., Brendel, S., Cunha, P.M., Astola, N., Song, B., Furlong, E.E., Leonard, K.R. & Bullard, B.
J Cell Sci 2005 Apr 1;118(Pt 7):1527-36. Epub 2005 Mar 15.
Thick filaments in striated muscle are myosin polymers with a length and diameter that depend on the fibre type. In invertebrates, the length of the thick filaments varies widely in different muscles and additional proteins control filament assembly. Thick filaments in asynchronous insect flight muscle have an extremely regular structure, which is likely to be essential for the oscillatory contraction of these muscles. The factors controlling the assembly of thick filaments in insect flight muscle are not known. We previously identified a thick filament core protein, zeelin 1, in Lethocerus flight and non-flight muscles. This has been sequenced, and the corresponding proteins in Drosophila and Anopheles have been identified. The protein has been re-named myofilin. Zeelin 2, which is on the outside of Lethocerus flight muscle thick filaments, has been sequenced and because of the similarity to Drosophila flightin, is re-named flightin. In Drosophila flight muscle, myofilin has a molecular weight of 20 kDa and is one of five isoforms produced from a single gene. In situ hybridisation of Drosophila embryos showed that myofilin RNA is first expressed late in embryogenesis at stage 15, a little later than myosin. Antibody to myofilin labelled the entire A-band, except for the H-zone, in cryosections of flight and non-flight muscle. The periodicity of myofilin in Drosophila flight muscle thick filaments was found to be 30 nm by measuring the spacing of gold particles in labelled cryosections; this is about twice the 14.5 nm spacing of myosin molecules. The molar ratio of myofilin to myosin in indirect flight muscle is 1:2, which is the same as that of flightin. We propose a model for the association of these proteins in thick filaments, which is consistent with the periodicity and stoichiometry. Myofilin is probably needed for filament assembly in all muscles, and flightin for stability of flight muscle thick filaments in adult flies.
Creation of a minimal tiling path of genomic clones for Drosophila: provision of a common resource.
Hollich, V., Johnson, E., Furlong, E.E., Beckmann, B., Carlson, J., Celniker, S.E. & Hoheisel, J.D.
Biotechniques 2004 Aug;37(2):282-4.
On the basis of shotgun subclone libraries used in the sequencing of the Drosophila melanogaster genome, a minimal tiling path of subclones across much of the genome was determined. About 320,000 shotgun clones for chromosomes X(12-20), 2R, 2L, 3R, and 4 were available from the Berkeley Drosophila Genome Project. The clone inserts have an average length of 3.4 kb and are amenable to standard PCR amplification. The resulting tiling path covers 86.2% of chromosome X(12-20), 86.2% of chromosomal arm 2R, 79.0% of 2L, 89.6% of 3R, and 80.5% of chromosome 4. In total, the 25,135 clones represent 76.7 Mb--equivalent to about 67% of the genome--and would be suitable for producing a microarray on a single slide.
Integrating transcriptional and signalling networks during muscle development.
Curr Opin Genet Dev 2004 Aug;14(4):343-50.
A fundamental aspect of developmental decisions is the ability of groups of cells to obtain the competence to respond to different signalling inputs. This information is often integrated with intrinsic transcriptional networks to produce diverse developmental outcomes. Studies in Drosophila are starting to reveal a detailed picture of the regulatory circuits controlling the subdivision of the dorsal mesoderm, which gives rise to diverse muscle types including cardioblasts, pericardial cells, body wall muscle and gut muscle. The combination of a common set of mesoderm autonomous transcription factors (e.g. Tinman and Twist) and spatially restricted inductive signals (e.g. Dpp and Wg) subdivide the dorsal mesoderm into different competence domains. The integration of additional signalling inputs with localised repression within these competence domains results in diverse transcriptional responses within neighbouring cells, which in turn generates muscle diversity.
Obtaining a Global View of Drosophila Muscle Development.
Book chapter. In ?Muscle Development in Drosophila? by Landes Bioscience Publishing, 2004.
Notch and Ras signaling pathway effector genes expressed in fusion competent and founder cells during Drosophila myogenesis.
Artero, R., Furlong, E.E., Beckett, K., Scott, M.P. & Baylies, M.
Development 2003 Dec;130(25):6257-72.
Drosophila muscles originate from the fusion of two types of myoblasts, founder cells (FCs) and fusion-competent myoblasts (FCMs). To better understand muscle diversity and morphogenesis, we performed a large-scale gene expression analysis to identify genes differentially expressed in FCs and FCMs. We employed embryos derived from Toll10b mutants to obtain primarily muscleforming mesoderm, and expressed activated forms of Ras or Notch to induce FC or FCM fate, respectively. The transcripts present in embryos of each genotype were compared by hybridization to cDNA microarrays. Among the 83 genes differentially expressed, we found genes known to be enriched in FCs or FCMs, such as heartless or hibris, previously characterized genes with unknown roles in muscle development, and predicted genes of unknown function. Our studies of newly identified genes revealed new patterns of gene expression restricted to one of the two types of myoblasts, and also striking muscle phenotypes. Whereas genes such as phyllopod play a crucial role during specification of particular muscles, others such as tartan are necessary for normal muscle morphogenesis.
Gene expression during the life cycle of Drosophila melanogaster.
Arbeitman, M.N., Furlong, E.E., Imam, F., Johnson, E., Null, B.H., Baker, B.S., Krasnow, M.A., Scott, M.P., Davis, R.W. & White, K.P.
Science 2002 Sep 27;297(5590):2270-5.
Molecular genetic studies of Drosophila melanogaster have led to profound advances in understanding the regulation of development. Here we report gene expression patterns for nearly one-third of all Drosophila genes during a complete time course of development. Mutations that eliminate eye or germline tissue were used to further analyze tissue-specific gene expression programs. These studies define major characteristics of the transcriptional programs that underlie the life cycle, compare development in males and females, and show that large-scale gene expression data collected from whole animals can be used to identify genes expressed in particular tissues and organs or genes involved in specific biological and biochemical processes.
Patterns of gene expression during Drosophila mesoderm development.
Furlong, E.E., Andersen, E.C., Null, B., White, K.P. & Scott, M.P.
Science 2001 Aug 31;293(5535):1629-33.
The transcription factor Twist initiates Drosophila mesoderm development, resulting in the formation of heart, somatic muscle, and other cell types. Using a Drosophila embryo sorter, we isolated enough homozygous twist mutant embryos to perform DNA microarray experiments. Transcription profiles of twist loss-of-function embryos, embryos with ubiquitous twist expression, and wild-type embryos were compared at different developmental stages. The results implicate hundreds of genes, many with vertebrate homologs, in stage-specific processes in mesoderm development. One such gene, gleeful, related to the vertebrate Gli genes, is essential for somatic muscle development and sufficient to cause neural cells to express a muscle marker.
Automated sorting of live transgenic embryos.
Furlong, E.E., Profitt, D. & Scott, M.P.
Nat Biotechnol 2001 Feb;19(2):153-6.
The vast selection of Drosophila mutants is an extraordinary resource for exploring molecular events underlying development and disease. We have designed and constructed an instrument that automatically separates Drosophila embryos of one genotype from a larger population of embryos, based on a fluorescent protein marker. This instrument can also sort embryos from other species, such as Caenorhabditis elegans. The machine sorts 15 living Drosophila embryos per second with more than 99% accuracy. Sorting living embryos will solve longstanding problems, including (1) the need for large quantities of RNA from homozygous mutant embryos to use in DNA microarray or gene-chip experiments, (2) the need for large amounts of protein extract from homozygous mutant embryos for biochemical studies, for example to determine whether a multiprotein complex forms or localizes correctly in vivo when one component is missing, and (3) the need for rapid genetic screening for gene expression changes in living embryos using a fluorescent protein reporter.
Expression of a 74-kDa nuclear factor 1 (NF1) protein is induced in mouse mammary gland involution. Involution-enhanced occupation of a twin NF1 binding element in the testosterone-repressed prostate message-2/clusterin promoter.
Furlong, E.E., Keon, N.K., Thornton, F.D., Rein, T. & Martin, F.
J Biol Chem. 1996 Nov 22;271(47):29688-97.
Testosterone repressed prostate message-2 (TRPM-2)/clusterin gene expression is rapidly induced in early involution of the mouse mammary gland, after weaning, and in the rat ventral prostate, after castration. A search for involution-enhanced DNaseI footprints in the proximal mouse TRPM-2/clusterin gene promoter led to the identification and characterization (by DNase I footprinting and EMSA) of a twin nuclear factor 1 (NF1) binding element at -356/-309, relative to the proposed transcription start site; nuclear extracts from 2-day involuting mouse mammary gland showed an enhanced footprint over the proximal NF1 element; extracts from involuting prostate showed enhanced occupancy of both NF1 binding elements. Subsequent EMSA and Western analysis led to the detection of a 74-kDa NF1 protein whose expression is triggered in early involution in the mouse mammary gland; such an induced protein is not found in the involuting rat ventral prostate. This protein was not found in lactation where three other NF1 proteins of 114, 68, and 46 kDa were detected. Reiteration of the epithelial cell apoptosis associated with early mammary gland involution, in vitro, in a primary cell culture system, triggered the appearance of the 74-kDa NF1. Overlaying the cells with laminin-rich extracellular matrix suppressed the apoptosis and the expression of the 74-kDa NF1 and, in the presence of lactogenic hormones, initiated milk protein gene expression and the expression of two of the lactation-associated NF1 proteins (68 and 46 kDa). This study, thus, identifies for the first time the occurrence of a switch in expression of different members of the family of NF1 transcription factors as mammary epithelial cells move from the differentiated to the involution/apoptotic state, and it is likely that the involution-specific 74-kDa NF1 accounts for the enhanced NF1 footprint detected on the TRPM-2/clusterin promoter with extracts of mouse mammary gland.
YY1 and NF1 both activate the human p53 promoter by alternatively binding to a composite element, and YY1 and E1A cooperate to amplify p53 promoter activity.
Furlong, E.E., Rein, T. & Martin, F.
Mol Cell Biol. 1996 Oct;16(10):5933-45.
A novel transcription factor binding element in the human p53 gene promoter has been characterized. It lies about 100 bp upstream of the major reported start site for human p53 gene transcription. On the basis of DNase I footprinting studies, electromobility shift assay patterns, sequence specificity of binding, the binding pattern of purified transcription factors, effects of specific antibodies, and methylation interference analysis we have identified the site as a composite element which can bind both YY1 and NF1 in an independent and mutually exclusive manner. The site is conserved in the human, rat, and mouse p53 promoters. The occupancy of the site varies in a tissue-specific manner. It binds principally YY1 in nuclear extracts of rat testis and spleen and NF1 in extracts of liver and prostate. This may facilitate tissue-specific control of p53 gene expression. When HeLa cells were transiently transfected with human p53 promoter-chloramphenicol acetyltransferase reporter constructs, a mutation in this composite element which disabled YY1 and NF1 binding caused a mean 64% reduction in basal p53 promoter activity. From mutations which selectively impaired YY1 or NF1 binding and the overexpression of YY1 or NF1 in HeLa cells we concluded that both YY1 and NF1 function as activators when bound to this site. In transient cotransfections E1A could induce the activity of the p53 promoter to a high level; 12S E1A was threefold as efficient as 13S E1A in this activity, and YY1 bound to the composite element was shown to mediate 55% of this induction. Overexpressed YY1 was shown to be able to synergistically activate the p53 promoter with E1A when not specifically bound to DNA. Deletion of an N-terminal domain of E1A, known to be required for direct E1A-YY1 interaction and E1A effects mediated through transcriptional activator p300, blocked the E1A induction of p53 promoter activity.
- ERC Investigator Click here to learn more about the European Research Council