ResOps, daily adventures of DevOps in research

network links

The rise of cloud computing has disrupted the way businesses build and purchase their IT infrastructure. Obtaining just the right amount of resources around the world with a few clicks (or API calls) unlocks completely new operational scenarios. Cloud computing, when done right, can solve availability and responsiveness problems in delivering services to a worldwide user base.

But how can Science take advantage of this disruptive technology?

Where everything started

In mid-2015, the Technology and Science Integration team at EMBL-EBI started exploring how to bring Clouds into our routine IT activities. This included the compute and storage needed by our scientists, as well as the public-facing services our Service Teams provide to their global scientific communities. Running our pipelines on infrastructure owned by somebody else sparked a few discussions – especially around the sensitivity of the data. Is the data already in the public domain? Does the data relate to human data with controlled access? Or is the data embargoed before release to the public? For this reason, we decided to start by focusing just on public data!

EMBL-EBI Data Centre — *One of our Data Centres – EMBL-EBI has currently ~120 PBs of installed storage capacity and a total of 40.000 CPUs available in our batch systems.*

Working in collaboration with the Metagenomics team, led by Rob Finn, we packaged up their pipeline (currently running on our internal LSF cluster) for cloud environments. EMBL-EBI owns and operates an OpenStack-based cloud, called Embassy Cloud, and that was our first target. But it was only the beginning of our journey.

Automate, automate, automate

IT Services at EMBL-EBI use automation tools everyday – with Puppet representing the de-facto standard for IT configuration management. At TSI we knew we were facing a slightly different challenge – the hybrid cloud – which implied supporting multiple cloud providers via on-demand deployments. Removing idle resources is key if you wish to minimise the invoice you get at the end of the month!

When we started, the only tool able to interact with most clouds was Terraform, and we adopted it. We haven’t regretted it so far. Configuration management was next: how do we turn those 20 VMs we just provisioned into a compute cluster? Ansible offered an easy to adopt solution that only required a working SSH connection. Also, Ansible’s learning curve is much easier compared to, for example, Puppet. While we were happy to carry out the initial pipeline porting, long term support was a different story. Once packaged, we handed back the pipeline back to Rob Finn’s team – two years down the line, they’re now keen users of Ansible themselves.

In a matter of months, we started adopting the whole of the DevOps toolchain. To cut down the deployment times we used Packer to pre-bake images of our VMs via Jenkins. We started pre-baking the pipeline as well, reducing install time from 40 mins to ~20 seconds. We embraced DevOps, but it quickly became clear that it wasn’t the usual DevOps – no web apps here – just research. And the term ‘ResOps’ was born.

ResOps, or DevOps for Research

Adapting the Metagenomics pipeline to be deployed in the cloud was a very interesting – and error prone – exercise. We learned a lot from it and this knowledge can be used all across scientific computing. But what exactly is ResOps?

From a IT point of view, we believe ResOps is the adoption of the DevOps toolchain to support research purposes.

Science has different aims – and needs – compared to the normal DevOps “clients”. However, both fields face the same challenge: a change in culture. Researchers moving to the cloud will need to understand that there will be change in how their compute requirements are satisfied. This is also part of what we define ResOps.

ResOps, the training

Cultural changes are usually difficult, unless there’s a clear need underpinning them. At EMBL-EBI this need came from optimising our infrastructure and our workloads to adopt cloud computing. Why not share this knowledge with the wider scientific community?

*ResOps training, Heidelberg, January 2018*

May 2017 saw the first edition of our ResOps training, and was quickly followed by four other dates. Starting from the basics (we actually have a Cloud 101 talk!) the training walks attendees through all the tips and tricks we believe are important to keep in mind when deploying science workloads in clouds. However, it is the practicals that our attendees love the most. Backed by the EMBL-EBI Embassy Cloud, they get to run Terraform and Ansible for real, provisioning VMs or a small compute cluster.

The course isn’t trying to sell the idea that moving pipelines to clouds is easy or quick. In most cases, it’s not. However, there’s a lot of jargon and dark spots around clouds and that is what we’re trying to address through the ResOps course. At a recent training, one of the attendees, just before leaving the room, said to one of our trainers:

“I never thought I would be using clouds for my needs, but after today I’m definitely going to try to use them”
An anonymous attendee

And that’s exactly what we’re aiming to with the training: provide the basic tools scientists need to sail the Cloud. If you’re interested in the training, the material used in our last edition is freely available here.

The next steps

A lot of things have changed since 2015 in undertaking scientific computing in public or private clouds. First of all, there’s more evidence-based knowledge out there. People tried (and failed) to build or port their workloads for cloud environments and there’s quite a lot to learn from those experiences. At the same time, many cloud providers have rolled out native services to help bridging the gap between their offering and scientific compute: AWS Batch, GCP Pipelines and Azure Batch are good examples of this.

On our side, the research side, there are ongoing efforts to address the (many) remaining challenges. GA4GH, for example, is defining a set of standards for distributed computing on geographically dispersed genomics datasets without compromising the ethical constraints under which the data may have been collected. At the European level, the EC-funded project HNSciCloud is teaming up with commercial providers to develop solutions to lower the barriers for scientific computing in the cloud. At the same time, the EC funded EOSCpilot and EOSC-Hub projects are laying the foundations of what the European Open Science Cloud (EOSC) is going to be.

In such a quickly changing landscape, there are always more questions than answers. However, this is going to improve in the next few years, also thanks to the efforts of the projects above. So, stay tuned 🙂

Written by Dario Vianello, Cloud Bioinformatics Application Architect

IT and Technical Services Head Office