The Stegle group develops and applies statistical and machine learning methods for deciphering molecular variation across individuals, space, and time.

Previous and current research

Our interest lies in computational approaches for unravelling molecular and phenotypic variation. How do genetic background and environment jointly shape phenotypic traits or cause diseases? How are genetic and external factors integrated at different molecular levels, and how variable are these molecular readouts between individual cells?

We use statistical inference and machine learning as our main tools to address these questions. The methods we develop allow us to exploit large and high-dimensional omics data to identify disease signatures and pinpoint causal drivers. Our group combines principles from classical statistics, machine learning, and causal reasoning. One of our current research aims is to develop improved association tests for genome-wide association studies (GWAS) that scale to millions of samples while accounting for technical and biological confounding factors. A second aim is developing approaches for dimensionality reduction that allow the integration of data across omics modalities, profiled across space and time (see figure).

Our methodological research aims tie in with experimental collaborations. In this context, we develop methods to fully exploit large-scale datasets obtained using the most recent profiling technologies. Via international collaborations and networks, we have established population-scale genetic and molecular data resources that allow us to unravel gene regulatory dependencies with unprecedented resolution in pluripotent cells and in cancer. We are also actively advancing multi-omics profiling methods in single cells and applying spatial omics to a variety of questions in fundamental biology, as well as in disease contexts.

Future projects and goals

We will continue to develop innovative computational methods to analyse data from high-throughput genetic and molecular profiling studies. We are particularly interested in following up recent opportunities to assay single cells in large population cohorts, thereby identifying genotype–phenotype relationships with single-cell resolution. Access to single-cell profiling datasets in hundreds and soon in tens of thousands of individuals will deliver novel possibilities to link disease variants to cell state transitions. These data will also open up opportunities to study how genetic risk factors affect cellular ecosystems in human tissues. A second area for future activities is the development of genetic perturbations in human induced pluripotent stem cells, as well as in non-human model systems. In particular, our ERC Synergy project DECODE will pioneer the systematic perturbation of hundreds of key regulators in vivo, combined with single-cell profiling to comprehensively phenotype such mutants.

Figure 1: Illustration of a statistical method for integrating multiple omics datasets.

Figure 1: Illustration of a statistical method for integrating multiple omics datasets. Multi-omics factor analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples or cells.