Register Login

Top Kafka Interview Questions and Answers

Updated Sep 25, 2019

What is Kafka?

Kafka is an open-source software developed by LinkedIn. It is commonly used in real-time analytics for storing, analysing and streaming information. It now belongs to the Apache Software Foundation. The software is programmed in Java and Scala. Major companies are using the Kafka to analyse their stocks and decide which to promote or invest.

What are the important components of Kafka?

The most important components of Kafka are as follows:

  • Consumers
  • Topics
  • Clusters
  • Brokers
  • Consumer groups
  • Followers
  • Replicas
  • Partitions
  • Producers
  • Leaders

What are some advantages of using Kafka?

Some advantages of using Kafka are as follows:

  • It has the ability for handling large volumes of data without high-powered hardware. It offers a message throughput of around a thousand messages per second
  • It has a distributes architecture that makes it very scalable with features like partitioning and replication
  • It has very low latency and handles thousands of messages with high throughput. Writing and reading messages occur very fast
  • It is very durable and messages are stored securely

How Kafka is different from other messaging systems?

In other messaging system, subscribers can pull messages from the end of the queue. While pulling a message, queues allow a certain level of the transaction. After being processed, the message is removed from the queue.

But in Kafka, messages published to the topics are persisted with. Even after a consumer receives them, they will not be removed. Many consumers can process logic based on similar events. This is not possible in messaging systems.        

What is zookeeper in Kafka?

The Kafka cluster nodes statuses are kept in check using the Zookeeper along with partitions and topics. For the Brokers that form a cluster, the Zookeeper manages their service discovery. It also sends the topology modifications to Kafka. This notifies when a new topic is created or removed when a Broker joined a cluster or died etc. Access control lists are also managed by Zookeeper.

What is a broker in Kafka?

A broker in Kafka is also called Kafka node or Kafka server. It is responsible for receiving messages from the producers and keep them on a disk. This disk is keyed through a unique offset. Consumers can get messages from these brokers based on the partition, offset or partition. Brokers exchange information between each other using the Zookeeper, to create a cluster.

Brokers are also used for maintaining the load balancing in Kafka. 

What is the maximum message size a Kafka server can receive? 

1000000 bytes

Explains Producer and Consumer in Kafka?

Producers in Kafka are used for sending data to Brokers. They perform a search for a new broker when it is created and send messages to it. It sends messages very fast without waiting for any acknowledgement. They are used for writing data to the Topics.

Consumers are applications that are used for fetching data from topics. As the messages are stored for a fixed time, consumers are utilised for tracking them. It uses a partition offset to maintain the number of consumed messages.

What is ISR in Kafka?

ISR in Kafka stands for in-sync replica. The broker having the partition leader manages all the writes and reads of the partition records. Kafka replicates these writes to followers. An ISR is selected as the new leader if the partition leader fails.

What is a Kafka partitioning key?

In Kafka, if a message is sent to a broker by a producer, the partition is determined to apply a hash function to the key of that message. So, messages having the same key will go to the same partition.

Based on a partition’s key, producers send them records. For the records that do not have a key, the Round Robin strategy is used. But, the producer selects the partition.

What is a topic in Apache Kafka?

In Kafka, the messages are stored and published in a category name called topic. They are all organised using topics. For sending a message it has to be sent to a topic. To read it, it has to be read from a topic. Topics have a unique name across the Kafka cluster. Data is pulled from a topic.

Explain Kafka offset?

In a Kafka partition, an offset is a position where the consumer will send its next message. There are two offset types – current and committed offset. Current offset handles the records that are already sent and avoids resending the records again to the same consumer.

Committed records handle processed records. It avoids sending the same records to the new consumer while rebalancing a partition.

What is the retention period in Kafka?

Clusters in Kafka store published records properly by using a retention period that is configurable. For example, if the retention period is set to 3 days, the record will be available for consumption 3 days after being published. It will be discarded after that period for freeing up space.    

What is the default retention period for a kafka topic?

Default retention period for a kafka topic is 7 days ie. 168 hrs.


×