Edit
Live dashboard with Kafka Streams API – IT and Technical Services Head Office

IT and Technical Services Head Office

Home to the Service Desk, IT Security and departmental support services.

Live dashboard with Kafka Streams API

This post describes how to build a prototype dashboard that displays, in real time, statistics from the new Apache Kafka Streams API.

Summary

Apache Kafka is a well known distributed messaging system. It is usually used together with Apache Spark to compute statistics from logs. However, since version 0.10.0, it has its own client library to process streams. In this post I will show how to build a pipeline that uses the new API.

Use case

At EMBL-EBI we provide public access to biological data resources. These resources can be accessed via a REST API. On our side, a Traffic Manager (TM) distributes queries to separate computational LSF clusters. Within each cluster, requests are being queued and wait for execution. It is our objective to configure TM in such a way that each user waits a similar amount of time for their job to be executed. We have prior information on each cluster’s performance, but we still need a real-time monitoring system to be able to adjust the TM ratios dynamically in case of events that are hard to predict.

Quick start

Please follow the instructions in GitHub: kafka-streams-api-websockets (feel free to comment on this example in GitHub).

Architecture

Tools

  • Java 8+
  • Apache Kafka 0.11
  • Apache Kafka Streams API
  • Spring Boot
  • STOMP
  • Chart.js

Result

Detailed description

The description is split into five steps.

Step 1: Setup Kafka

Please download and uncompress the Kafka binaries from https://kafka.apache.org/downloads, for example:

wget http://mirror.vorboss.net/apache/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz
tar -xzf kafka_2.11-0.11.0.0.tgz
cd kafka_2.11-0.11.0.0

Now you need to start Zookeeper, which controls Kafka brokers:

bin/zookeeper-server-start.sh config/zookeeper.properties

When Zookeeper is running you can start Kafka server:

bin/kafka-server-start.sh config/server.properties

In our example we will use two topics. Let’s create new topics.

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic data-in
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic data-out

We can confirm that topics were created by running:

bin/kafka-topics.sh --list --zookeeper localhost:2181

Step 2: Generate dummy logs

In order to run a dummy logs generator we need to clone example sources and package them:

git clone https://github.com/sajmmon/kafka-streaming-websockets.git
cd kafka-streaming-websockets
mvn clean package

Now we run the generator, which writes logs to Kafka topic data-in:

java -cp target/shade-kafka-streaming-websockets-0.1.0.jar uk.ac.ebi.produce.KafkaExampleProducer

We can preview data-in topic with:

bin/kafka-console-consumer.sh \
--topic data-in \
--new-consumer --bootstrap-server localhost:9092 \
--property print.key=true

Sample output looks like this:

The first column is the messages key and it represents the cluster name. The second column is the message’s values and it represents the waiting time of one job.

Step 3: Aggregate logs

Now we can run the actual Kafka Streams client, which computes the average waiting time for each cluster over 60 second windows. We also create new windows every 10 seconds. As a result, at any time we have six overlapping windows being updated.

The client reads data from topic data-in and writes output to topic data-out:

java -cp target/shade-kafka-streaming-websockets-0.1.0.jar uk.ac.ebi.streaming.KafkaStreamingMain

We can preview the content of data-out topic in a terminal with:

bin/kafka-console-consumer.sh \
--topic data-out \
--new-consumer --bootstrap-server localhost:9092 \
--property print.key=true

The sample output looks like:

In the first column we have the cluster name and window’s start timestamp. In the second column we have the number of jobs and average waiting time.

Step 4: Run embedded Tomcat

We use Spark Boot to: listen to Kafka topic data-out; make data available with websocket endpoint; and to host HTML/JS files that contain the live dashboard.

In order to start Spark Boot application:

java -cp target/shade-kafka-streaming-websockets-0.1.0.jar uk.ac.ebi.Application

Step 5: Visit dashboard website

You should have a dashboard available in localhost:8080. In order to start listening to data in data-out topic please click “Connect”.

Data points are being added to the chart:

Written by Szymon Chojnacki.

Edit