It was a global endeavour to rival the space race: billions of dollars invested, a massive international effort of singular scientific, medical, industrial and societal significance. The nail-bitingly close and bitterly fought battle culminated in the publishing on 15 and 16 February 2001 of two scientific papers, one in Nature the other in Science, of the first draft sequence of the human genome.
As the European hub for molecular biology, EMBL is at the forefront of innovation in life science research; genomic science plays a central part in its past, present and future.
Genome sequences, likened to a genetic blueprint, encode an organism’s entire hereditary information. By decoding this information, biologists can find hidden links between genetic makeup and observable characteristics such as shape or morphology, development and behaviour. It was widely held that sequencing the human genome would uncover all that was needed to understand how the human body works, what makes us human, and why we differ from each other.
Genome sequencing, not of humans but of model organisms such as yeast and fruitfly, was in full swing by the late 1990s. Lars Steinmetz, Joint Head of EMBL’s Genome Biology Unit reveals: “It was becoming clear that this information was useful; it provided a different way to approach biological questions – instead of focusing on specific mechanisms of isolated components, you can determine how general these mechanisms are across the entire genome.”
Less is more
The headline-grabbing discovery of the publications was the number of genes in the human genome. “We completely missed the mark,” says Lars, “the field as a whole predicted a much larger number than there actually is.” Estimates varied from 30,000 to 100,000 genes, but the actual figure is 21,000 – barely more than a nematode worm. But the picture is more complex than that.
“We had also assumed that most of the genome was ‘junk DNA’ and in fact this doesn’t turn out to be the case,” adds Lars. It turns out that non-coding DNA – stretches of the molecule that don’t produce proteins and were considered ‘junk’ – actually have a function. “Over 70% of the genome is transcribed into RNA; lots of this is obviously non-coding RNAs and the race is still on to figure out what these RNAs do.”
“The human genome sequence provided a blueprint of all the protein-coding genes in the human genome for the first time,” reveals Jan Ellenberg, Head of the Cell Biology and Biophysics Unit at EMBL Heidelberg, “this changed how we go about studying protein function.” Working within Mitocheck, a European research consortium, Jan has helped exploit sequence knowledge to systematically interrogate all 21,000 protein-coding genes one after another, identifying which are involved in cell division.
This is just one of countless advances in scientific ingenuity and technology over the past decade, sparked by the first draft sequence. Vladimir Benes, head of EMBL’s Genomics Core Facility, is keenly aware of the rapid technological development: “The older systems could analyse a maximum of 384 samples over a day, today we are deciphering the sequence of 400 million fragments in one go.” These next-generation ‘massively-parallel’ sequencers can achieve in just a few days what once took over a decade to accomplish.
Data, data everywhere
“It is one thing to generate the data but another to make it into useful pieces of information,” says Vladimir. Fortunately, Paul Flicek and his colleagues at EMBL-EBI are rising to the challenge by helping to curate, catalogue and give access to the masses of genetic data being generated globally – not just gene sequences, but outcomes of studies into everything from gene regulation to protein function. Via integrated resources such as ENSEMBL, “we bring all this information together in a genomic context,” says Paul.
This expertise is coming into its own in the internationally collaborative 1000 Genomes Project: “While the genome sequence is what makes us human, it’s the differences in the genome sequences between all of us that are really the interesting part,” says Paul. “Since the human genome was finished, one of the big efforts has been to try to understand and catalogue human variation.” Identifying disease ‘hot spots’ by studying variation across populations could help scientists better understand how genetics influence human health.
Indeed, of all the technological and scientific advances in the wake of the first sequence, many see the development of ‘personalised medicine’ as the most tangible. “Ten years from now we will be at a stage where we will routinely sequence individual’s genomes and use this information for health decisions,” says Lars. While diagnosis and drug discovery has yet to be revolutionised by genome science, it’s no longer a case of ‘if’, but ‘when’.
The next decade of the genomic age holds much promise: “Technology is opening up enormous horizons in front of researchers,” says Vladimir. From widening the scope of model organisms to uncovering the inner workings of cells, for molecular biologists the human genome sequence has untold potential as a final frontier for exploratory science.