In Big data, the data is maintained in huge amounts.Big data contains two major challenges. The major thing about Big data is it should be maintained carefully   and the second thing is that it should be analyzed carefully .To overcome this problem a message system is being required.

Learn more about this technology Big Data Hadoop Online Training in this overview.

Messaging system:

A Messaging system is responsible for transferring of data between applications. The application concentrates on maintaining and collecting the data, but it does not bother about how the data is being placed and organized.

In Messaging system, data is being transferred in two ways :

  1. Point – point system
  2. Publish – subscribe system
Point –point system: 

 In point to point message system, the source and destination were fixed before the data was sent.  The data transfers here travels securely in a random fashion. The disadvantage of this system is all the messages were sent in queue sequentially. There is no chance of sending particular intermediate message if there is an important message that needs to be sent. All the messages need to be waiting until its turn. More over there is no chance of sending messages to number of destinations at a time. To overcome this problem Point – subscribe method was introduced.

message Queue/OnlineITGuru

Publish–Subscribe System: 

In publish subscribe system, the data senders are called the Publishers and the data receivers are called the Subscribers .For one Publisher, they can be multiple subscribers. The real time example of Dish TV. Here the producers are the owner of Dish TV and the consumer is Television users. Here as Television users can subscribe the channels as per their needs.

Publish Subscribe System/OnlineITGuru


Kafka is  a publish service message system developed by  Linkedin in  the  year 2012 for stream analysis of Strom and Spark.This system is built  on  the top of Zookeeper Synchronization service . The Kafka can handle large volume s of data and is responsible  for the  transferring of message  between application for both  Online and Offline  message consumption . Kafka can handle  the large volumes of data with a  great speed  . Its efficiency is 2 millions writes /sec.   The messages in the kafka are persisted in a disk and can be replaced with a replaced with a cluster at the time of failure . The  major advantage of Kafka is  low latency and high Fault tolerance


The architecture of Kafka can  be explained with the following diagram:

Before going to know about its working lets know   some components in the Kafka ecosystem:


A producer is responsible for transferring data to the broker. When a new broker enters into the ecosystem, all the producers starts sending data into it .  The producers does not bother about the acknowledgements from the broker and sends the data as far as it can handle .

Broker :

since the data  handled in the eco system is in  tera bytes  it maintains multiple brokers in   the ecosystem. Each Kafka instance can handle hundreds and thousands of reads and writes per second  . Among those many brokers there will be one leader and number of followers . If the leader falls down , automatically one of the followers will become a leader .

Consumer :

 The consumer is responsible for handling   the data from the broker . Since the broker  doesn’t acknowledge the data received  to the producer, the  consumer acknowledges  the data is received  from the broker through  the off set value . If the consumer acknowledges a off set value means  the receive all the data up to that particular index which is notified by the Apache Zookeeper. The advantage  to the consumer is that , it can  stop (or) skip the flow of messages at any instant.



Zookeeper is responsible for coordinating the actions between  Producers and Consumers . Its major role is to notify about the presence or absence of nodes  and the transmissions of data in the ecosystem.

Get in touch with OnlineITGuru for mastering the Big Data Hadoop Online Course .

Recommended Audience :

Software developers

ETL  developers

Project Managers

Team Lead’s

  • In order to start learning Big Data has no prior requirement to have knowledge on any technology required to learn Big Data Hadoop and also need to have some basic knowledge on java concept.
  • Its good to have a knowledge on Oops concepts and Linux Commands

Drop Us A Query

100% Secure Payments. All major credit & debit cards accepted.