{"id":373,"date":"2017-09-11T14:00:00","date_gmt":"2017-09-11T14:00:00","guid":{"rendered":"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/?p=373"},"modified":"2020-10-01T14:22:00","modified_gmt":"2020-10-01T14:22:00","slug":"live-dashboard-with-kafka-streams-api","status":"publish","type":"post","link":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/blog\/2017\/09\/live-dashboard-with-kafka-streams-api\/","title":{"rendered":"Live dashboard with Kafka Streams API"},"content":{"rendered":"\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered  size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"640\" class=\"vf-figure__image\" src=\"https:\/\/www.ebi.ac.uk\/about\/clusters\/technical-services\/wp-content\/uploads\/2020\/10\/apache-1024x640.jpg\" alt=\"\" class=\"wp-image-374\" srcset=\"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/apache-1024x640.jpg 1024w, https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/apache-300x188.jpg 300w, https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/apache-768x480.jpg 768w, https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2020\/10\/apache.jpg 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>This post describes how to build a prototype dashboard that displays, in real time, statistics from the new Apache Kafka Streams API.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p>Apache Kafka is a well known distributed messaging system. It is usually used together with Apache Spark to compute statistics from logs. However, since version 0.10.0, it has its own client library to process streams. In this post I will show how to build a pipeline that uses the new API.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h3 class=\"wp-block-heading\">Use case<\/h3>\n\n\n\n<p>At EMBL-EBI we provide public access to biological data resources. These resources can be accessed via a REST API. On our side, a Traffic Manager (TM) distributes queries to separate computational LSF clusters. Within each cluster, requests are being queued and wait for execution. It is our objective to configure TM in such a way that each user waits a similar amount of time for their job to be executed. We have prior information on each cluster&#8217;s performance, but we still need a real-time monitoring system to be able to adjust the TM ratios dynamically in case of events that are hard to predict.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Quick start<\/h3>\n\n\n\n<p>Please follow the instructions in GitHub: <a href=\"https:\/\/github.com\/ebi-wp\/kafka-streams-api-websockets\">kafka-streams-api-websockets<\/a> (feel free to comment on this example in GitHub).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture<\/h3>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered \"><img decoding=\"async\" class=\"vf-figure__image\" src=\"http:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2017\/09\/Screen-Shot-2017-09-12-at-11.26.44-640x278.png\" alt=\"\" class=\"wp-image-536\"\/><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Tools<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Java 8+<\/li><li>Apache Kafka 0.11<\/li><li>Apache Kafka Streams API<\/li><li>Spring Boot<\/li><li>STOMP<\/li><li>Chart.js<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Result<\/h3>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered \"><img decoding=\"async\" class=\"vf-figure__image\" src=\"http:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2017\/09\/dashboard-1024x653.png\" alt=\"\" class=\"wp-image-512\"\/><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Detailed description<\/h3>\n\n\n\n<p>The description is split into five steps.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Step 1: Setup Kafka<\/h4>\n\n\n\n<p>Please download and uncompress the Kafka binaries from <a href=\"https:\/\/kafka.apache.org\/downloads\">https:\/\/kafka.apache.org\/downloads<\/a>, for example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>wget http:\/\/mirror.vorboss.net\/apache\/kafka\/0.11.0.0\/kafka_2.11-0.11.0.0.tgz\ntar -xzf kafka_2.11-0.11.0.0.tgz\ncd kafka_2.11-0.11.0.0<\/code><\/pre>\n\n\n\n<p>Now you need to start Zookeeper, which controls Kafka brokers:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bin\/zookeeper-server-start.sh config\/zookeeper.properties<\/code><\/pre>\n\n\n\n<p>When Zookeeper is running you can start Kafka server:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bin\/kafka-server-start.sh config\/server.properties<\/code><\/pre>\n\n\n\n<p>In our example we will use two topics. Let&#8217;s create new topics.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bin\/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic data-in\nbin\/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic data-out<\/code><\/pre>\n\n\n\n<p>We can confirm that topics were created by running:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bin\/kafka-topics.sh --list --zookeeper localhost:2181<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Step 2: Generate dummy logs<\/h4>\n\n\n\n<p>In order to run a dummy logs generator we need to clone example sources and package them:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/sajmmon\/kafka-streaming-websockets.git\ncd kafka-streaming-websockets\nmvn clean package<\/code><\/pre>\n\n\n\n<p>Now we run the generator, which writes logs to Kafka topic data-in:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>java -cp target\/shade-kafka-streaming-websockets-0.1.0.jar uk.ac.ebi.produce.KafkaExampleProducer<\/code><\/pre>\n\n\n\n<p>We can preview <em>data-in<\/em> topic with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bin\/kafka-console-consumer.sh \\\n--topic data-in \\\n--new-consumer --bootstrap-server localhost:9092 \\\n--property print.key=true<\/code><\/pre>\n\n\n\n<p>Sample output looks like this:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered \"><img decoding=\"async\" class=\"vf-figure__image\" src=\"http:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2017\/09\/Screen-Shot-2017-09-11-at-16.24.19-640x354.png\" alt=\"\" class=\"wp-image-522\"\/><\/figure><\/div>\n\n\n\n<p>The first column is the messages key and it represents the cluster name. The second column is the message&#8217;s values and it represents the waiting time of one job.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Step 3: Aggregate logs<\/h4>\n\n\n\n<p>Now we can run the actual Kafka Streams client, which computes the average waiting time for each cluster over 60 second windows. We also create new windows every 10 seconds. As a result, at any time we have six overlapping windows being updated.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered \"><img decoding=\"async\" class=\"vf-figure__image\" src=\"http:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2017\/09\/Screen-Shot-2017-09-12-at-11.27.01-640x208.png\" alt=\"\" class=\"wp-image-538\"\/><\/figure><\/div>\n\n\n\n<p>The client reads data from topic <em>data-in<\/em> and writes output to topic <em>data-out<\/em>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>java -cp target\/shade-kafka-streaming-websockets-0.1.0.jar uk.ac.ebi.streaming.KafkaStreamingMain<\/code><\/pre>\n\n\n\n<p>We can preview the content of <em>data-out<\/em> topic in a terminal with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bin\/kafka-console-consumer.sh \\\n--topic data-out \\\n--new-consumer --bootstrap-server localhost:9092 \\\n--property print.key=true<\/code><\/pre>\n\n\n\n<p>The sample output looks like:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered \"><img decoding=\"async\" class=\"vf-figure__image\" src=\"http:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2017\/09\/Screen-Shot-2017-09-11-at-16.32.03-640x361.png\" alt=\"\" class=\"wp-image-524\"\/><\/figure><\/div>\n\n\n\n<p>In the first column we have the cluster name and window&#8217;s start timestamp. In the second column we have the number of jobs and average waiting time.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Step 4: Run embedded Tomcat<\/h4>\n\n\n\n<p>We use Spark Boot to: listen to Kafka topic <em>data-out; <\/em>make data available with websocket endpoint; and to host HTML\/JS files that contain the live dashboard.<\/p>\n\n\n\n<p>In order to start Spark Boot application:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>java -cp target\/shade-kafka-streaming-websockets-0.1.0.jar uk.ac.ebi.Application<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Step 5: Visit dashboard website<\/h4>\n\n\n\n<p>You should have a dashboard available in localhost:8080. In order to start listening to data in <em>data-out<\/em> topic please click &#8220;Connect&#8221;.<\/p>\n\n\n\n<p>Data points are being added to the chart:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"vf-figure  | vf-figure--align vf-figure--align-centered \"><img decoding=\"async\" class=\"vf-figure__image\" src=\"http:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-content\/uploads\/2017\/09\/Screen-Shot-2017-09-12-at-11.32.15-640x361.png\" alt=\"\" class=\"wp-image-540\"\/><\/figure><\/div>\n\n\n\n<p class=\"has-text-align-right\"><em>Written by Szymon Chojnacki.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post describes how to build a prototype dashboard that displays, in real time, statistics from the new Apache Kafka Streams API. Summary Apache Kafka is a well known distributed messaging system. It is usually used together with Apache Spark to compute statistics from logs. However, since&hellip;<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[377],"tags":[2086,2081,2082,2083,2087,2084,2085],"embl_taxonomy":[],"class_list":["post-373","post","type-post","status-publish","format-standard","hentry","category-web-development","tag-api","tag-chartjs","tag-java","tag-kafka","tag-restful-api","tag-spring-boot","tag-websockets"],"acf":[],"embl_taxonomy_terms":[],"featured_image_src":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-includes\/images\/media\/default.svg","_links":{"self":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/comments?post=373"}],"version-history":[{"count":8,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/373\/revisions"}],"predecessor-version":[{"id":383,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/posts\/373\/revisions\/383"}],"wp:attachment":[{"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/media?parent=373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/categories?post=373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/tags?post=373"},{"taxonomy":"embl_taxonomy","embeddable":true,"href":"https:\/\/www.ebi.ac.uk\/about\/teams\/its\/wp-json\/wp\/v2\/embl_taxonomy?post=373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}