Figure 1: Automated multivariate phenotyping of cells by combinatorial RNAi and automated image analysis.

Figure 2: Epistatic interactions between genetic or drug perturbations are mapped from high-throughput microscopy data by multivariate phenotyping and vector space modelling.

The Huber group aims to understand inter-individual differences by large-scale statistical modelling and integrating multiple levels of genomic and molecular information from individuals with their phenotypic variation in health and disease.

Previous and current research

A central challenge of biomedicine is to understand how biological systems that underlie healthy life and disease react to variations in their make-up (e.g. through genetic variation) or their environment (e.g. through drugs). We aim to understand systems through large-scale data acquisition and quantitative modelling of phenotypes and molecular profiles, systematic perturbations (e.g. drug or RNAi screens) and computational analysis of non-linear, epistatic interaction networks.

The group brings together researchers from quantitative disciplines – mathematics, statistics, physics and computer science – and from different fields of biology and medicine. We develop methods needed to master big datasets, and we address questions in personal genomics and molecular medicine. We employ statistics – the science and art of reasoning based on uncertain, incomplete and noisy data – and machine learning, which help humans to discover patterns in data, understand mechanisms, and act upon predictive and causal relationships.

Genomics and other molecular profiling technologies have resulted in increasingly detailed biology-based understanding of human disease. The next challenge is using this knowledge to engineer treatments and cures. To this end, we integrate observational data – such as from large-scale sequencing and molecular profiling of human samples – with interventional data – such as from systematic genetic or chemical screens – to reconstruct a fuller picture of the underlying causal relationships and actionable intervention points. A fascinating example is our collaboration on genotype-specific vulnerability and resistance of tumours to targeted drugs in our precision oncology project together with T. Zenz at the National Center for Tumour Diseases.

As we engage with new data types, our aim is to develop high-quality computational and statistical methods of wide applicability. We consider the release and maintenance of scientific software an integral part of doing science in this area, and we contribute to the Bioconductor Project, an open source software collaboration to provide tools for the analysis and study of high-throughput genomic data. An example is our DESeq2 package for analysing count data from high-throughput sequencing.

Future projects and goals

We aim to enable exploitation of new data types, new types of experiments and studies by developing the computational techniques needed to turn raw data into biology.

  • Precision oncology: genome sequencing and other levels of molecular profiling are increasingly used in clinical cancer care, but translation of these data into actions remains a bottleneck. We work with clinical researchers to develop predictive assays and algorithms.
  • Many powerful mathematical and computational ideas exist but are difficult to access. We aim to translate them into practical methods and software that make a real difference to biomedical researchers. We sometimes term this approach ‘Translational Statistics’.
  • Transcriptomics, gene regulation and 3D nuclear organisation.
  • Quantitative proteomics and in vivo drug-target mapping.
  • Single-cell and single-molecule data modelling.
  • High-throughput multidimensional phenotyping: mapping gene-gene and gene-drug interactions through computational image analysis of cell and tissue microscopy, machine learning and mathematical modelling.