{"id":349,"date":"2018-02-08T08:53:17","date_gmt":"2018-02-08T08:53:17","guid":{"rendered":"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/?p=349"},"modified":"2020-10-01T09:31:45","modified_gmt":"2020-10-01T09:31:45","slug":"resops-daily-adventures-of-devops-in-research","status":"publish","type":"post","link":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/blog\/2018\/02\/resops-daily-adventures-of-devops-in-research\/","title":{"rendered":"ResOps, daily adventures of DevOps in research"},"content":{"rendered":"\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/wp-content\/uploads\/2018\/02\/background-2462436_1920-1200x313.jpg\" alt=\"network links\" width=\"1200\" height=\"313\"><\/p>\n\n\n\n<p>The rise of cloud computing has disrupted the way businesses build and purchase their IT infrastructure. Obtaining <em>just<\/em> the right amount of resources around the world with a few clicks (or API calls) unlocks completely new operational scenarios. Cloud computing, when done right, can solve availability and responsiveness problems in delivering services to a worldwide user base.<\/p>\n\n\n\n<p><em>But how can <strong>Science<\/strong> take advantage of this disruptive technology?<\/em><\/p>\n\n\n\n<!--more-->\n\n\n\n<h3 class=\"wp-block-heading\">Where everything started<\/h3>\n\n\n\n<p>In mid-2015, the <a href=\"https:\/\/www.ebi.ac.uk\/about\/people\/steven-newhouse\">Technology and Science Integration team<\/a> at <a href=\"https:\/\/www.ebi.ac.uk\/\">EMBL-EBI<\/a> started exploring how to bring Clouds into our routine IT activities. This included the compute and storage needed by our scientists, as well as the public-facing services our <a href=\"https:\/\/www.ebi.ac.uk\/about\/people\">Service Teams<\/a> provide to their global scientific communities. Running our pipelines on infrastructure <em>owned by somebody else<\/em> sparked a few discussions &#8211; especially around the sensitivity of the data. Is the data already in the public domain? Does the data relate to human data with controlled access? Or is the data embargoed before release to the public? For this reason, we decided to start by focusing just on public data!<\/p>\n\n\n\n<figure class=\"vf-figure wp-block-image alignnone wp-image-932 size-large\"><img decoding=\"async\" class=\"vf-figure__image\" src=\"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/wp-content\/uploads\/2018\/02\/Data_Centre_EMBL-EBI_Gyron-1-1024x255.jpg\" alt=\"EMBL-EBI Data Centre\" class=\"wp-image-932\"\/><figcaption class=\"vf-figure__caption\"><em>One of our Data Centres &#8211; EMBL-EBI has currently ~120 PBs of installed storage capacity and a total of 40.000 CPUs available in our batch systems.&nbsp;<\/em><\/figcaption><\/figure>\n\n\n\n<p>Working in collaboration with the <a href=\"https:\/\/www.ebi.ac.uk\/metagenomics\/\">Metagenomics team<\/a>, led by <a href=\"https:\/\/www.ebi.ac.uk\/about\/people\/rob-finn\">Rob Finn<\/a>, we <em>packaged up<\/em> their pipeline (currently running on our internal LSF cluster) for cloud environments. EMBL-EBI owns and operates an <a href=\"https:\/\/www.openstack.org\/\">OpenStack<\/a>-based cloud, called <a href=\"http:\/\/www.embassycloud.org\/\">Embassy Cloud<\/a>, and that was our first target. But it was only the beginning of our journey.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Automate, automate, automate<\/h3>\n\n\n\n<h3 class=\"wp-block-heading\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/wp-content\/uploads\/2018\/02\/analytics-3088958_1920-1-1024x235.jpg\" alt=\"logs\" width=\"1024\" height=\"235\"><\/h3>\n\n\n\n<p>IT Services at EMBL-EBI use automation tools everyday &#8211; with <a href=\"https:\/\/puppet.com\/\">Puppet<\/a> representing the <em>de-facto<\/em> standard for IT configuration management. At TSI we knew we were facing a slightly different challenge &#8211; the <em>hybrid cloud<\/em> &#8211; which implied supporting multiple cloud providers via on-demand deployments. Removing idle resources is key if you wish to minimise the invoice you get at the end of the month!<\/p>\n\n\n\n<p>When we started, the only tool able to interact with most clouds was <a href=\"https:\/\/www.terraform.io\/\">Terraform<\/a>, and we adopted it. We haven\u2019t regretted it so far. Configuration management was next: <em>how do we turn those 20 VMs we just provisioned into a compute cluster?<\/em> <a href=\"https:\/\/www.ansible.com\/\">Ansible<\/a> offered an easy to adopt solution that only required a working SSH connection. Also, <a href=\"https:\/\/www.ansible.com\/\">Ansible<\/a>\u2019s learning curve is much easier compared to, for example, <a href=\"https:\/\/puppet.com\/\">Puppet<\/a>. While we were happy to carry out the initial pipeline porting, long term support was a different story. Once packaged, we handed back the pipeline back to <a href=\"https:\/\/www.ebi.ac.uk\/metagenomics\/\">Rob Finn&#8217;s team<\/a> &#8211; two years down the line, they\u2019re now keen users of <a href=\"https:\/\/www.ansible.com\/\">Ansible<\/a> themselves.<\/p>\n\n\n\n<p>In a matter of months, we started adopting the whole of the DevOps toolchain. To cut down the deployment times we used <a href=\"https:\/\/www.packer.io\/\">Packer<\/a> to pre-bake images of our VMs via <a href=\"https:\/\/jenkins.io\/\">Jenkins<\/a>. We started pre-baking the pipeline as well, reducing install time from 40 mins to ~20 seconds. We embraced DevOps, but it quickly became clear that it wasn\u2019t the usual DevOps &#8211; no web apps here &#8211; just research. And the term \u2018ResOps\u2019 was born.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ResOps, or DevOps for Research<\/h3>\n\n\n\n<p>Adapting the Metagenomics pipeline to be deployed in the cloud was a very interesting &#8211; <em>and error prone<\/em> &#8211; exercise. We learned a lot from it and this knowledge can be used all across scientific computing. &nbsp;But what exactly is ResOps?<\/p>\n\n\n\n<p>From a IT point of view, we believe <em><strong>ResOps<\/strong> <strong>is the adoption of the DevOps toolchain to support research purposes<\/strong><\/em><strong>.<\/strong><\/p>\n\n\n\n<p>Science has different aims &#8211; and <em>needs<\/em> &#8211; compared to the normal DevOps \u201cclients\u201d. However, both fields face the same challenge: a <em>change in culture<\/em>. Researchers moving to the cloud will need to understand that there will be change in how their compute requirements are satisfied. This is also part of what we define ResOps.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ResOps, the training<\/h3>\n\n\n\n<p>Cultural changes are usually difficult, unless there\u2019s a clear need underpinning them. At EMBL-EBI this need came from optimising our infrastructure and our workloads to adopt cloud computing. Why not share this knowledge with the wider scientific community?<\/p>\n\n\n\n<figure class=\"vf-figure wp-block-image alignnone wp-image-940 size-large\"><img decoding=\"async\" class=\"vf-figure__image\" src=\"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/wp-content\/uploads\/2018\/02\/IMG_5502_Cropped-1024x330.jpg\" alt=\"ResOps training, Heidelberg, January 2018\" class=\"wp-image-940\"\/><figcaption class=\"vf-figure__caption\"><em>ResOps training, Heidelberg, January 2018<\/em><\/figcaption><\/figure>\n\n\n\n<p>May 2017 saw the <em>first<\/em> edition of our <a href=\"http:\/\/bit.ly\/resops_may2017\">ResOps training<\/a>, and was quickly followed by <em>four other dates<\/em>. Starting from the basics (we actually have a<em> Cloud 101<\/em> talk!) the training walks attendees through all the tips and tricks we believe are important to keep in mind when deploying science workloads in clouds. However, it is the practicals that our attendees love the most. Backed by the EMBL-EBI Embassy Cloud, they get to run Terraform and Ansible for real, provisioning VMs or a small compute cluster.<\/p>\n\n\n\n<p>The course isn\u2019t trying to sell the idea that <em>moving pipelines to clouds is easy or quick<\/em>. In most cases, it&#8217;s not. However, there\u2019s a lot of jargon and dark spots around clouds and that is what we\u2019re trying to address through the ResOps course. At a recent training, one of the attendees, just before leaving the room, said to one of our trainers:<\/p>\n\n\n\n<blockquote class=\"vf-blockquote\"><p><em>&#8220;I never thought I would be using clouds for my needs, but after today I\u2019m definitely going to try to use them&#8221;<\/em><\/p><p><cite><em>An anonymous attendee<\/em><\/cite><\/p><\/blockquote>\n\n\n\n<p>And that\u2019s exactly what we\u2019re aiming to with the training: <em>provide the basic tools scientists need to sail the Cloud<\/em>. If you\u2019re interested in the training, the material used in our last edition is freely available <a href=\"http:\/\/bit.ly\/resops_jan2018\">here<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The next steps<\/h3>\n\n\n\n<p>A lot of things have changed since 2015 in undertaking scientific computing in public or private clouds. First of all, there\u2019s more evidence-based knowledge out there. People tried (and failed) to build or port their workloads for cloud environments and<em> there\u2019s quite a lot to learn from those experiences<\/em>. At the same time, many cloud providers have rolled out native services to help bridging the gap between their offering and scientific compute: <a href=\"https:\/\/aws.amazon.com\/batch\/\">AWS Batch<\/a>, <a href=\"https:\/\/cloud.google.com\/genomics\/v1alpha2\/pipelines\">GCP Pipelines<\/a> and <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/batch\/\">Azure Batch<\/a> are good examples of this.<\/p>\n\n\n\n<p>On our side, the research side, there are ongoing efforts to address the (many) <em>remaining challenges<\/em>.&nbsp;<a href=\"https:\/\/www.ga4gh.org\/\">GA4GH<\/a>, for example, is defining a set of standards for distributed computing on&nbsp;geographically dispersed genomics datasets without&nbsp;compromising the ethical constraints under which the data may have been collected.&nbsp;At the European level, the EC-funded project&nbsp;<a href=\"http:\/\/www.hnscicloud.eu\/\">HNSciCloud<\/a>&nbsp;is teaming up with commercial providers to develop solutions to lower the barriers for scientific computing in the cloud.&nbsp;At the same time, the EC funded <a href=\"http:\/\/eoscpilot.eu\/\">EOSCpilot<\/a> and <a href=\"http:\/\/www.eosc-hub.eu\/\">EOSC-Hub<\/a> projects are laying the foundations of what the <a href=\"https:\/\/ec.europa.eu\/research\/openscience\/index.cfm?pg=open-science-cloud\">European Open Science Cloud<\/a> (EOSC) is going to be.<\/p>\n\n\n\n<p>In such a quickly changing landscape, <em>there are always more questions than answers<\/em>. However, this is going to improve in the next few years, also thanks to the efforts of the projects above. <em>So, stay tuned <\/em>\ud83d\ude42<\/p>\n\n\n\n<p class=\"has-text-align-right\"><em>Written by Dario Vianello<\/em>, <em>Cloud Bioinformatics Application Architect<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The rise of cloud computing has disrupted the way businesses build and purchase their IT infrastructure. Obtaining just the right amount of resources around the world with a few clicks (or API calls) unlocks completely new operational scenarios. Cloud computing, when done right, can solve&hellip;<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[374,2069],"tags":[252,2058,2070,2060],"embl_taxonomy":[],"class_list":["post-349","post","type-post","status-publish","format-standard","hentry","category-cloud","category-resops","tag-cloud","tag-cloud-computing","tag-devops","tag-resops"],"acf":[],"embl_taxonomy_terms":[],"featured_image_src":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-includes\/images\/media\/default.svg","_links":{"self":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/349","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/comments?post=349"}],"version-history":[{"count":4,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/349\/revisions"}],"predecessor-version":[{"id":354,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/349\/revisions\/354"}],"wp:attachment":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/media?parent=349"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/categories?post=349"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/tags?post=349"},{"taxonomy":"embl_taxonomy","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/embl_taxonomy?post=349"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}