EMBL HPC

Photo by Vasiliki Karyoti (IT Services)

The EMBL IT infrastructures are at the heart of the laboratory’s Digital Biology programme, facing massive challenges of vast and exponentially growing quantities of research data generated by high-throughput experiments. EMBL operates IT centres at all sites dedicated to site-specific requirements. The two biggest IT centres are at EMBL Heidelberg, Germany and at EMBL-EBI in Hinxton, UK.

At EMBL Heidelberg the IT infrastructure is designed to support data-driven science involving key technologies such as genome sequencing, large-scale imaging, computational biology and modelling. This ensures the enormous amounts of data generated in Heidelberg can be securely stored, analysed and shared, within the laboratory or across large international research consortia.

The EMBL IT Services in Heidelberg operate a highly resilient dual-data centre infrastructure which builds on state-of-the-art technology and, according to demand, is flexibly scalable in terms of capacity and performance. With these strategies in place the centre is designed to support an exponentially growing data footprint in terms of data storage as well as high-performance compute (HPC) and cloud-based capacities to support downstream analysis. IT Services operate a multi-tiered data storage platform with a current capacity of 40 PB - presently doubling each 18 months - building on high-end disk, flash and tape technology and efficiently supporting the different needs of EMBL’s scientists in terms of I/O performance, robustness, durability and cost.

Scientific data analysis at EMBL to a large extent relies on access to high-performance computing. An increasing number of users, presently about 40% of all EMBL scientists, implement HPC as part of their data analysis workflows. In 2017 these users spent about 100 billion CPU seconds spread across more than 18 million compute jobs using one of the key number crunching resources, the EMBL Heidelberg’s HPC cluster. This facility primarily supports the needs of local researchers but also offers capacity to the wider scientific EMBL community. Based on latest Intel and AMD technology this facility presently provides access to more than 14,000 CPU cores and a total RAM/memory of 90 TB. The cluster further integrates compute power using GPU hardware, based on recent Nvidia Pascal hardware and adding about 330 TFLOPS of floating point performance. The underlying ultra-fast parallel file system is designed to support the massive I/O needs found with data intensive bioinformatics and computational biology HPC workloads.

EMBL Storage cloud

Photo by Vasiliki Karyoti (IT Services)

IT virtualisation and cloud technology are vital to on-demand IT services delivery at EMBL. With completely virtualised data centres at EMBL Heidelberg, Rome and Barcelona the IT Services are able to provide large-scale robust, scalable, flexible and cost-effective IT services to EMBL users. The IT team operates a growing number of virtualised servers (VMs) on their platform providing core IT services as well as specialised servers hosted for scientific and non-scientific groups across EMBL. In addition, there are dedicated cloud areas such as the EMBL 3D Cloud which has been designed by IT Services to support the needs of EMBL Scientific Groups and Core Facilities involved in imaging Big Data analysis. Using cloud-based GPU power, it offers remote visualisation and image data analysis capacities at the scale of high-end graphics workstations and at the same time fully exploiting the performance of the data centre infrastructures in terms of I/O performance, etc. The EMBL Science Cloud, which is currently being implemented in collaboration with the IT team at EMBL-EBI in Hinxton, is an IaaS solution designed for internal use. Further to the internal provision of cloud, the IT Services also support the integration of commercial and public cloud services to broaden their portfolio of innovative IT services.

IT Services operate resilient high-speed network connectivity to multiple Internet providers. Individual network links provide up to 10 Gb/s bandwidth supporting for example, fast access to EMBL-EBI resources or the sharing of large-scale scientific data across international scientific collaborations. Furthermore, these strong network links allow users from the other EMBL sites to access key IT resources in Heidelberg or provide access to internal and external cloud services for EMBL users. In addition, state-of-the-art software tools are in place to support efficient large volume data movement fully exploiting the bandwidth capacity of the aforementioned network links.

The EMBL IT infrastructure is looking forward to an exciting future, implementing innovative technology and best practise, responding to the ever-growing demands of IT delivery for EMBL’s science. Our team of life science IT professionals is highly committed to work with the scientists and experts at EMBL and beyond to provide the future IT solutions for EMBL science and taking Digital Biology at EMBL to the next level.