Kafka in Nutshell
INTRODUCTION
o Kafka is just like a messaging System.
o It is distributed Platform or Application.
o Cluster is made up of more than one Kafka Server.
o Production Environment Kafka is referred as Kafka Cluster.
o Each Kafka Server is referred as Broker.
Architecture

Kafka is a Fault Tolerant
Ability of a System to continue operating without interruption when one or more of its Component Fails.
Replication Factor
It decide how many replication of broker. Kafka Cluster messages are replicated in multiple of replication factor.
Kafka is Scalable System
You can add new Brokers and Increase the Number of Consumers.
Throughput
Kafka can handle 1 Million Messages per second.
Zookeeper
It is distributed, open-source Configuration, Synchronization service.
Install Kafka & Zookeeper
Search the download link on Google.
Steps to install
1) wget download link
2) tar -xvzf tar file
Changes in Config Files
Kafka Configuration File named server.properties change advertised.listeners= <IP_ADDRESSS>:9092
9092 is a default port Address for Kafka.
Zookeeper Configuration File named zookeeper.properties change
zookeeper.connect=localhost:2181
Starting Kafka
JMX_PORT=8004
bin/kafka-server-start.sh config/server.properties
Starting Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
Install Kafka Manager
It is Graphical User Interface for Kafka.
Pre-requisite: Install Java11
Steps:
1) Git clone http://github.com/yahoo/CMAK.git
CMAK(Cluster Manager Apache Kafka)
2) ./sbt clean dist
3) Unzip cmak Folder
4) Cmak.zkhosts=”<Zookeeper-host>:2181" — application.conf
5) Bin/cmak -Dconfig.file=conf/application.conf -Dhttp.port=8080\
What is Kafka Topic?
o Topic is the Kafka component where producers are connected.
o Producer Publish message in Kafka Cluster.
o Topic in Kafka is multi Subscriber.
o Topic can be considered as logical Entity.
o In Kafka Cluster each topic is present in every cluster node.
Kafka Topic Partition
o Kafka Topic Divided into multiple partitions.
o Partition can be considered as linear Data Structure Topic like Array.
o Messages are actually published to a partition in the topic.
o Every Partition has Partition Number.
o Each Partition has increasing index called offsets
o New messages are always pushed at the rear end.
o Data is immutable after publish.
o In multi broker Kafka cluster partition for a topic are distributed across the whole cluster.
Kafka Producer in Python
o Message will be published by producer in available partitions.
o Message publish in random fashion in present partitions.
o Producer can also select the partition of their choice in a topic where it want to publish the message
Pre-Requisites:
Pip install kafka-python
Pip install Faker (For creating Fake Data)
Configuration Needed by Producer.
· Bootstrap-Servers
· Topic
· Value-Serializer
Kafka Consumer in Python
o Consumer are the Kafka Components that consume message from Kafka Topic.
o Internally Consumer consume message from Kafka topic partition.
o Every consumer is always assigned to a consumer group.
o If no group id is provided then random group id is assigned.
o Consumer Group is logical grouping of one or more Consumer.
o It is mandatory for a consumer for register itself to Consumer group.
Configuration Needed by Consumer
· Topic
· Bootstrap-Servers
· Group ID
· Value-Deserializer
Implementation in Python
producer.py

data.py

consumer.py
