The roots of the EMBL-EBI lie in the world's first nucleotide sequence database, the EMBL Nucleotide Sequence Data Library (now EMBL Bank, part of the European Nucleotide Archive), which was established in 1980 at the European Molecular Biology Laboratory in Heidelberg, Germany. The original goal was to establish a central database of DNA sequences, rather than have scientists submit sequences to journals.
What began as a modest task of abstracting information from scientific literature soon grew into a major database activity, with researchers submitting their data directly and an ever-increasing demand for highly skilled informaticians to manage it all. High-profile genome projects brought more attention to the project, and the commercial sector began to see the relevance of public data. The EMBL Nucleotide Sequence Data Library needed financial security to ensure its long-term viability.
EMBL-EBI is established in the UK
In 1992, EMBL Council voted to establish the EMBL-European Bioinformatics Institute (EMBL-EBI) and locate it on the Wellcome Trust Genome Campus in Hinxton, UK, where it would be in close proximity to the major sequencing efforts at the Wellcome Trust Sanger Institute. The transition of two major bioinformatics services from Heidelberg to Hinxton began in 1992 and in September 1994, EMBL-EBI was firmly established in the UK.
The European Nucleotide Archive and the protein sequence resource UniProt (then known as Swiss-Prot–TrEMBL) were the original EMBL-EBI databases. Since then, the EMBL-EBI has played a major part in the bioinformatics revolution: we now provide the world’s most comprehensive range of molecular databases and offer an extensive user training programme. Our basic research programme has grown substantially and remains closely tied with the evolution of our resources.
Researchers today depend on access to large data sets of many different types, spanning genes, proteins and the behaviour of small molecules. Breakthrough methods such as DNA sequencing have changed the face of research in such a short time that they are considered to be disruptive technologies - they are so much better than what came before that it is difficult for researchers to adapt.
A veritable flood of data is coming out of life science experiments every day, which is good news for researchers but poses some fascinating challenges. Consider that the amount of data produced is doubling twice as quickly as computer storage and processing power, and that this rate is increasing. It is all too easy to take for granted that data generated in publicly funded experiments will be stored, managed and kept freely available for researchers to query.
Bioinformatics makes it possible to collect, store and add value to these data so that researchers in many fields can retrieve and analyse them efficiently. The EMBL-European Bioinformatics Institute is one of very few places in the world that has the resources and expertise to fulfil this important task.
Why bioinformatics matters
Biological data is the bedrock of life science research. Here are a few examples of how it can be used in beneficial ways:
- Understanding plant genomes helps us identify which species will be most tolerant to drought, salt and pests while still providing optimum nutrition. EMBL-EBI hosts Ensembl Genomes, a service that lets researchers access and compare genome-scale data from agriculturally relevant species.
- If we can identify patterns of genes that are active in different tumours, we can diagnose and treat cancers earlier.
- Methicillin-resistant Staphylococcus aureus (MRSA) infection is a global problem. Small variations in DNA sequence can help track transmission - this technology can help identify the source of new outbreaks.
- Drug resistance is growing to the one medicine used to treat Schistosomiasis, a parasitic infection. Studying the Schistosome genome will help identify the targets of existing drugs.
- In order to develop new drugs, researchers need to identify targets and build on previous research by a vast number of individual R&D efforts. ChEMBL provides a freely available catalogue of bioactive, drug-like small molecules and the tools to explore them.
- Short sections of DNA - called barcodes - are used to identify an organism. The Barcode of Life Initiative uses the European Nucleotide Archive to implement DNA barcoding as a global standard for identifying species, which will have applications in the protection of endangered species, sustaining natural resources through pest control and food labelling.
Quick facts and figures
What is the European Bioinformatics Institute?
- The European Bioinformatics Institute (EMBL-EBI) is Europe's hub for big data in biology.
- EMBL-EBI was established in 1994 on the Genome Campus near Cambridge in the UK, and is part of the European Molecular Biology Laboratory (EMBL).
- We have over 500 members of staff, including PhD students, postdocs, senior scientists, software developers, scientific data curators, grants officers, user experience analysts and many others.
What is bioinformatics?
Bioinformatics is the application of computer technology to the storage, management and analysis of data from life science experiments. One of the biggest challenges in biology today is analysing the massive volumes of data created in “high-throughput” experiments, for example DNA sequencing. Bioinformatics makes it possible to extract meaningful information from a sea of data. It provides the means to pull together many different kinds of information so that we can begin piecing together the great puzzle of how biological systems work.
Some basics about bioinformatics
- The storage capacity of computing hardware doubles every 18 months but new biological data are doubling every 9 months. These rates are increasing.
- The cost of sequencing has fallen dramatically, and the major bottleneck in life science research today is data analysis.
- Users of Europe’s biological databases range from clinical specialists to environmental researchers and computer scientists.
- The EMBL-EBI website is visited by approximately 11,000 unique IP, or web addresses, a day. That could represent many more people because, just like phone numbers, an IP address might represent an individual or an entire organisation.
- The data storage capacity of EMBL-EBI is approximately 40 petabytes (PB, or 1x1015 bytes).
What is EMBL?
The European Molecular Biology Laboratory (EMBL) is at the forefront of innovation in life sciences research, technology development and transfer, and provides outstanding training and services to the scientific community in its member states. This publicly funded, non-profit institute is housed at five sites in Europe whose expertise covers the whole spectrum of molecular biology.
EMBL research units: Heidelberg, Germany (EMBL headquarters); EMBL-EBI in Hinxton, UK; Grenoble, France; Hamburg, Germany; and Monterotondo, Italy.
EMBL member states: Austria, Belgium, Croatia, Czech Republic, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom. Associate member states Argentina and Australia.