Limited Period Offer - Upto 70% | OFFER ENDING IN: 0 D 0 H 0 M 0 S

Log In to start Learning

Login via

Post By Admin Last Updated At 2020-06-11
Explain about Kafka?

In Big data, the data maintained in huge amounts.Big data contains two major challenges. The major thing about Big data it should be maintained carefully   and the second thing is that it should be analyzed carefully .To overcome this problem a message system is being required.

Learn more about this technology Big Data Training in this overview.
Messaging system:

A Messaging system is responsible for transferring of data between applications. The application concentrates on maintaining and collecting the data, but it does not bother about how the data being placed and organized.

In Messaging system, data is being transferred in two ways :

  1. Point – point system
  2. Publish – subscribe system
Point –point system: 

 In point to point message system, the source and destination fixed before the data was sent.  The data transfers here travels securely in a random fashion. The disadvantage of this system is all the messages were sent in queue sequentially. There is no chance of sending particular intermediate message if there is an important message that needs to be sent. All the messages need to be waiting until its turn. More over there is no chance of sending messages to number of destinations at a time. To overcome this problem Point – subscribe method introduced.

message Queue/OnlineITGuru

Publish–Subscribe System: 

In publish subscribe system, the data senders called the Publishers and the data receivers called the Subscribers .For one Publisher, they can multiple subscribers. The real time example of Dish TV. Here the producers are the owner of Dish TV and the consumer is Television users. Here as Television users can subscribe the channels as per their needs.

Publish Subscribe System/OnlineITGuru


Generally Kafka, publish service message system developed by  Linkedin in  the  year 2012 for stream analysis of Strom and Spark.This system is built  on  the top of Zookeeper Synchronization service . The Kafka can handle large volume s of data and is responsible  for the  transferring of message  between application for both  Online and Offline  message consumption . Simultaneously Kafka can handle  the large volumes of data with a  great speed  . Its efficiency is 2 millions writes /sec.   The messages in the kafka persisted in a disk and replaced with a cluster at the time of failure . The  major advantage of Kafka is  low latency and high Fault tolerance

The architecture of Kafka can explained with the following diagram:Before going to know about its working lets know   some components in the Kafka ecosystem:

A producer is responsible for transferring data to the broker. When a new broker enters into the ecosystem, all the producers starts sending data into it .  The producers does not bother about the acknowledgements from the broker and sends the data as far as it can handle .

Broker :

since the data  handled in the eco system in  tera bytes  it maintains multiple brokers in   the ecosystem. Each Kafka instance can handle hundreds and thousands of reads and writes per second  . Among those many brokers there will be one leader and number of followers . In the same fashion If the leader falls down , automatically one of the followers will become a leader .

Consumer :

Similarly the consumer is responsible for handling   the data from the broker . Since the broker  doesn’t acknowledge the data received  to the producer. The  consumer acknowledges  the data received  from the broker through  the off set value . If the consumer acknowledges a off set value means  the receive all the data up to that particular index which is notified by the Apache Zookeeper. The advantage  to the consumer is that , it can  stop (or) skip the flow of messages at any instant.



To enumerate Zookeeper is responsible for coordinating the actions between  Producers and Consumers . Furthermore Its major role is to notify about the presence or absence of nodes  and the transmissions of data in the ecosystem.

Get in touch with OnlineITGuru for mastering the Big Data Hadoop Online Course .

Recommended Audience :
Software developersETL  developersProject ManagersTeam Lead’s
  • In order to start learning Big Data has no prior requirement to have knowledge on any technology required to learn Big Data Hadoop and also need to have some basic knowledge on java concept.
  • Its good to have a knowledge on Oops concepts and Linux Commands