{"id":367,"date":"2017-09-29T10:42:21","date_gmt":"2017-09-29T10:42:21","guid":{"rendered":"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/?p=367"},"modified":"2022-06-25T20:21:46","modified_gmt":"2022-06-25T20:21:46","slug":"web-analytics-how-we-collect-data","status":"publish","type":"post","link":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/blog\/2017\/09\/web-analytics-how-we-collect-data\/","title":{"rendered":"Web analytics: how we collect data"},"content":{"rendered":"\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered  size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"708\" class=\"vf-figure__image\" src=\"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/wp-content\/uploads\/2020\/10\/pexels-negative-space-97080-1024x708.jpg\" alt=\"\" class=\"wp-image-368\" srcset=\"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/pexels-negative-space-97080-1024x708.jpg 1024w, https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/pexels-negative-space-97080-300x207.jpg 300w, https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/pexels-negative-space-97080-768x531.jpg 768w, https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/pexels-negative-space-97080-1536x1061.jpg 1536w, https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/pexels-negative-space-97080-2048x1415.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>In this post on web analytics, we will share our experience and explore various aspects&nbsp;of how we&nbsp;in the Web Development team address client-side data collection and analysis. We will also provide a set of recommendations that help tackle common issues and present some use-cases that highlight specific implementation challenges and successes.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h3 class=\"wp-block-heading\">Understanding the context<\/h3>\n\n\n\n<blockquote class=\"vf-blockquote\"><p><em>&#8220;Often numbers don&#8217;t speak as loudly as they should because you are missing one simple ingredient: context.&#8221;<\/em><\/p><p><cite><em>Avinash Kaushik, author of Web Analytics 2.0<\/em><\/cite><\/p><\/blockquote>\n\n\n\n<p>Analytics data do not answer questions per se, but are relevant when they stand at the intersection of <a href=\"https:\/\/www.inc.com\/guides\/google\/201108\/secret-to-web-analytics.html\">business objectives, strategic goals and contextual metrics and KPIs <\/a>. Actually, they <em>become&nbsp;<\/em>relevant when they map to outcomes in that they fit into the organisation&#8217;s overall objectives and with specific goals in mind, accompanied with accurate measurements, performance targets and aimed at serving identifiable user segments.<\/p>\n\n\n\n<p>A common problem with data collection is posing the right question about data analysis and interpretation: what does it mean for our organisation to have one or two million hits per month? Is that too many? Too few? Just enough? Also, when do collected data start to provide most of their value? In the future, today or in the past? And who benefits from all of that of data when they are not translated into meaningful sources of knowledge?&nbsp;More importantly, the same numbers might signify different things to different people. Our focus is to put numbers in the context of the problem we are trying to solve, or the outcome we want to generate for the users of our services.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How EMBL-EBI collects data<\/h3>\n\n\n\n<p>As a general rule, data collection at the EMBL-EBI is performed in two ways:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">1. Server-side log analysis<\/h4>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered \"><img decoding=\"async\" class=\"vf-figure__image\" src=\"http:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2017\/07\/apache-logs-640x294.png\" alt=\"\" class=\"wp-image-442\"\/><\/figure><\/div>\n\n\n\n<p>Server-side logs are managed by the EMBL-EBI&nbsp;<a href=\"http:\/\/www.ebi.ac.uk\/about\/people\/rodrigo-lopez\">Web Production team<\/a>. Their main focus regarding data collection is around baseline statistics across all of EMBL-EBI&#8217;s online presence.&nbsp;This exercise of data gathering adds to a collection of other indicators of general use and impact. These are used for various executive, administrative and grant supporting purposes.&nbsp;The web log reports produced at EMBL-EBI are based on the logs generated by the web load-balancers. These record the traffic going to the EMBL-EBI hosted domains and associated IP addresses.<\/p>\n\n\n\n<p>The solutions adopted are the result of architectural decisions, based around the following criteria:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Sustainability: how the system is future-proof<\/li><li>Scalability: how the system scales<\/li><li>Performance: how the system responds to changes or requests<\/li><li>Cost: how the system is financially viable<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">2. Client-side tracking<\/h4>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered \"><img decoding=\"async\" class=\"vf-figure__image\" src=\"http:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2017\/07\/google-piwik-640x219.png\" alt=\"\" class=\"wp-image-454\"\/><\/figure><\/div>\n\n\n\n<p>We at the <a href=\"https:\/\/www.ebi.ac.uk\/about\/people\/jonathan-hickford\">Web Development team<\/a> manage EMBL-EBI&#8217;s corporate website and capture data only for what falls under the homepage and most of the \/about, \/services, \/research and \/training sections.&nbsp;It is important to note that, given EMBL-EBI&#8217;s&nbsp;huge online presence, it would be challenging to cover all of the organisation&#8217;s web ecosystem; this has to do both with costs (i.e.&nbsp;<a href=\"https:\/\/developers.google.com\/analytics\/devguides\/collection\/ios\/v3\/limits-quotas\">Google Analytics charges after a website receives 10 million hits per month<\/a>), and purpose (we want to collect data that we will be able to process and analyse for our projects).<\/p>\n\n\n\n<p>Currently, to perform client-side tracking we use <a href=\"https:\/\/www.google.com\/analytics\/#?modal_active=none\">Google Analytics<\/a> and\/or <a href=\"https:\/\/piwik.org\/\">Piwik<\/a>. Each of these solutions has pros and cons, and to understand which solution was right for us we ran a&nbsp;side by side test in the Web Development team, and that will be the topic of one of our upcoming posts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Our considerations on the adopted solutions<\/h3>\n\n\n\n<p>Here is a quick reference guide on how we assessed the adopted solutions at EMBL-EBI:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Approach<\/strong><\/td><td><strong>Summary<\/strong><\/td><td><strong>Usage<\/strong><\/td><\/tr><tr><td>Server-side<\/td><td>They are consistently implemented across all EMBL-EBI hosted services, e.g. it\u2019s tightly federated.<br><br>The Web Production team owns the implementation and exposes the data for all to analyse.<br><br>If a service is hosted at EMBL-EBI on EMBL-EBI servers, then data is being captured. Service teams don\u2019t need to do anything for this to work; it\u2019s on by default.<br><br>They are sustainable and scalable: the Web Production team are committed to supporting the system and ensuring it grows and remains performant, as&nbsp;EMBL-EBI grows.<br><br>They build analytical and statistically significant power on mid to long-term cycles.<\/td><td>Comparing general trends over long periods, e.g. this year vs last year.<br><br>It\u2019s the established source of EMBL-EBI statistics for long term trend analysis and baseline usage metrics, e.g. used by the Directors Office for producing annual reports.<\/td><\/tr><tr><td>Client-side<\/td><td>They are additive, it\u2019s extra data above the server side analytics to get richer detail about how users interact with a given service.<br><br>Each service team owns the implementation for their service, and its ongoing maintenance.<br><br>There are many different implementations and technologies used for each service, it can be highly customised for a team&#8217;s needs.<br><br>You need context to understand if a change in data is due to a change in user behaviour, a change to the analytics implementation, or a change to the service itself.<br><br>They provide most of their analytical value through fast-paced, incremental short-term iterations to deliver fresh data.<\/td><td>Getting additional detail about how users behave within a service, e.g. their journey through a service and what specific features they interact with.<br><br>Measuring against goals you have defined as a team.<br><br>Testing hypotheses about changes to a service, for example A\/B testing to see if a change to the service does have the desired effect.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Opportunities for learning<\/h3>\n\n\n\n<p>Client-side tracking in the Web Development team presents a great opportunity for understanding service utilisation and user behaviour, offering meaningful insights on how users interact with pages: where did the user scroll to, what did they hover on, which of two links to the same HTML page did the user click?<\/p>\n\n\n\n<p>We don&#8217;t have answers to these questions upfront and believe that an effective strategy on analytics requires a continuous and open dialogue:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>with the data: because we need to query the system with the right questions to retrieve the most relevant data<\/li><li>with the people: because we need alignment and a shared understanding in achieving common strategic goals and showing impact for the services we design<\/li><\/ul>\n\n\n\n<p>We have run projects using both Google Analytics and Piwik and we will share our some of our interesting&nbsp;experiences and the insights we gathered in the following posts of this series. As a starter here is the most common question we are asked.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Why don&#8217;t we have one single EMBL-EBI Google Analytics account?<\/h3>\n\n\n\n<p>There are two main reasons:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><strong>Cost<\/strong>. The EMBL-EBI site and the services running on it are very popular, and take us over the free limit. If you consider that many of the larger services run on other domains, e.g. UniProt, EuropePMC, Ensembl etc the total volume of traffic would be very large.<\/li><li><strong>Flexibility for teams<\/strong> to tailor analytics approaches. Across such a large and varied portfolio of services it would be very restrictive if all teams had to use the same analytics system configured in the same way.<\/li><\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">How we evolve<\/h3>\n\n\n\n<p>We in the Web Development team are transitioning to support our design and development decisions with statistically significant data to capture how users interact with EMBL-EBI services. This series of articles aims to specifically share how we overcome challenges, achieve successes and progress how we deliver value through what we do.<\/p>\n\n\n\n<p>Do you have similar challenges? Are you planning to use analytics on your next project? Please share your comment and\/or stories with the Web Development team at es-wwwdev[at]ebi.ac.uk,&nbsp;and if you are at the Genome Campus you are welcome to come visit us at the EBI East Wing, room A2-125.<\/p>\n\n\n\n<p><a href=\"https:\/\/twitter.com\/ebi_web_ux\/\">@ebi_web_ux<\/a><\/p>\n\n\n\n<p>wwwdev[at]ebi.ac.uk<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post on web analytics, we will share our experience and explore various aspects&nbsp;of how we&nbsp;in the Web Development team address client-side data collection and analysis. We will also provide a set of recommendations that help tackle common issues and present some use-cases that&hellip;<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[377],"tags":[2077,2078,2079,2080],"embl_taxonomy":[],"class_list":["post-367","post","type-post","status-publish","format-standard","hentry","category-web-development","tag-analytics","tag-data","tag-logs","tag-metrics"],"acf":[],"embl_taxonomy_terms":[],"featured_image_src":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-includes\/images\/media\/default.svg","_links":{"self":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/367","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/comments?post=367"}],"version-history":[{"count":4,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/367\/revisions"}],"predecessor-version":[{"id":1898,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/367\/revisions\/1898"}],"wp:attachment":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/media?parent=367"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/categories?post=367"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/tags?post=367"},{"taxonomy":"embl_taxonomy","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/embl_taxonomy?post=367"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}