Home to the Service Desk, IT Security and departmental support services.

EMBL-EBI makes a vast quantity of biological data freely available to the global scientific community, via a number of online archives and analysis platforms. For all this data to be useful, the user needs to be able to find what they’re looking for logically, quickly and easily. To do this, advanced, bespoke search functionality is required, and this is where EBI Search comes in….
EBI Search was first conceived of in 2006, during a discussion initiated by the then directors of EMBL-EBI regarding the increasingly complex search requirements of the institute’s growing data infrastructure. One of those present was Rodrigo Lopez, still the head of EMBL-EBI’s Web Production team to this day (“I almost resigned!” he chuckles).
What they were describing was a massive undertaking: an engine capable of cross referencing data sources in a way that no ‘off-the-shelf’ system could handle. It would need to be constructed in-house, from the ground up, and the effort and time required would be monumental… And so, “EB-eye” (its name later changed to the more palatable “EBI Search”), was born.
“This is not traditional IT,” says Rodrigo. “EBI Search is very powerful, capable and scalable, in a way that most search engines are not.” Most notably, it possesses the ability to cross reference results, so in addition to simply displaying the results based on your chosen search criteria, it also shows how these results might relate to each other, and, consequently, what other data might be relevant.
This ‘one-to-many’ relationship is crucial to enabling users to discover new data relevant to their research. Rodrigo explains: “if we know there is a relationship between two points: A and B, we automatically infer a relationship between B and A, so you can search in a bidirectional way across the data. If B has a relationship with C, then there is also an inferred relationship between A and C. Imagine a galactic explosion of points in the sky that represent each of these relationships. Our search system actually shows that.”
Indexing over 3 billion documents, this gigantic graph of cross references is thought to be several trillion nodes large, and growing on a daily basis. The richness of the data presents its own problems too. It can be anything: XML, images, structured data, text files, JSON, etc, and EBI Search must index it regardless. Equally, the data could be tiny in size — just a string of letters representing a genetic sequence — all the way up to a large collection of high resolution images, numbering many terabytes.
The value and importance of EBI Search was put to the test recently when EMBL-EBI, with funding from the European Commission, created the COVID-19 Data Portal, which is used by researchers all over the world studying the disease. After a massive cross-institute effort involving numerous teams and individuals (including the Web Production and Web Development teams from the Technical Services Cluster), the portal was operational in just two weeks. “Personally, I was very excited to take part in this project”, says the EBI Search Technical Project Lead, Youngmi Park, “we all understand the seriousness of the topic and the importance of the portal, and I have been very impressed by the collaboration between the teams”.
Critical to the portal’s success is the functionality of EBI Search, enabling all SARS-CoV-2 and COVID-19 related data to be indexed independently and accessed via the portal. The technological infrastructure that enables researchers everywhere to access, share and analyse these data is crucial in the race to understand the virus and identify viable treatments and vaccines. “The great thing about working here is that you’re not just fitting hard drives or writing code,” says Rodrigo. “What you’re doing has the potential to save lives.”