Apache Kafka FAQ

09 August,2019 by Jack Vamvas

These are some common questions I've received in working with Apache Kafka. I'll keep adding more detail

What is Apache Kafka ?

Apache Kafka is a distributed , partitioned, replicated commit log service. Apache Kafka can also be described  a Publish - Subscribe message system for distributed applications. 

What was the original purpose of Apache Kafka ?

Apache Kafka was developed within LinkedIn. LinkedIn invested in developing a single, unified , distributed pub-sub data pipeline. Prior to Kafka - LinkedIn maintained multiple data pipelines , such as inmail messaging , site events e.g site views and other operational data. Kafka was developed to unify the scalability effort of maintaining multiple data pipelines

Describe  some common use cases for  Apache Kafka

Activity tracking - think of the a publish-subscribe feed for event processing or monitoring. The original use case

Database Updates - and downstream processing. For example - a profile update on a web site may require other applications to be notified 

Aggregating Metrics - Pulling metrics from different source logs and placing in a central repository . 

Stream Processing - This has great potential.For example - process multiple daily news feeds from different sources and aggregate\enrich\publish

Messaging - Think : alternative to RabbitMQ or ActiveMQ

What is some common terminology used when discussing Kafka?

Cluster - Kaka is run as a cluster on one or more servers

Broker - The Cluster Servers are known as Brokers

Topic - A Kafka Cluster stores streams of records called topics

Partitions - Each Kafka broker has a unique ID and contains topic partitions

Producers - Sources writing to topics

Consumers - Read from Topics 

Connectors - Link Kafka to existing data systems

Stream processors - Assist transform of input stream to output stream


Is Apache Kafka an ETL tool? Streaming v Batch Processing

Rather than defining (or not defining) Apache Kafka as an ETL tool , I try and consider the context. An example of the context would be , differentiating between the microservices\messaging\stream streaming framework versus batch processing.

An example of batch processing is wrapping a bunch of tasks such as :  receive file > parsing > validating > cleansed > organized > aggregated  wrapped in a SSIS job (or other type of ETL tool)  and executing a  schedule i.e  non continuous data

An example of microservices\messaging\stream streaming framework  is  : continuous data . The key to the continuous data is the ability to pass the data through a type of messaging server which manages the continuous data flow. Apache Kafka is an example of this type of messaging server


What protocols  are supported by Kafka?

JSON , Avro 

Do you have a diagram giving a very high level view of Kafka?

This diagram - gives you a flavour of possibilities. 


What is Zookeper?

Zookeper manager distributed processes. It is compulsory part of an Apache Kafka cluster.Kafka Brokers use Zookeper for cluster controller election and cluster membership management

What are Kafka Brokers?

Kafka Brokers are the main messaging and storage parts of Apache Kafka. Apache Kafka has a concept called Topics. Topics is a another way of saying "message streams" . These topics are separated into partitions. The partitions are then replicated for the high availability .  The Kafka Cluster is managed by the Kafka Broker servers

What is a Kafka Connect worker?

The Kafka Connect worker allows Kafka to integrate with external systems. The Kafka Connect API permits configuring connectors that continually pull from some source data system into Kafka or push from Kafka into some sink data system. 

Are connections between Kafka and Producers encrypted by default?

Apache Kafka communicates in plaintext - by default , which means that all data is sent in the clear. To encrypt communication, it is recommended to configure the Apache Kafka components in your deployment to use SSL encryption.  It can be confusing by the usage of the term SSL - as TLS has  replaced SSL.    This approach deals with the Man in the Middle Attack . Once the data is at rest on the Broker - you'll need to consider if some sort of encryption is required at disk. 

Do you have links to some in-depth Kafka related blog posts ?

A Brief History of Kafka, LinkedIn’s Messaging Platform

How We’re Improving and Advancing Kafka at LinkedIn




Author: Jack Vamvas (http://www.dba-ninja.com)


Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment on Apache Kafka FAQ