As days pass on data is increasing tremendously. To handle such huge amount of data traditional data bases is not suitable . At that moment Big data comes into existence.
Big data refers to the data sets that are large and inadequate which the traditional data processing application software in inadequate to deal with them. This term has been in use since 1990’s . It challenges include data storage, data analysis , querying, updating and information privacy.
Get in touch with OnlineITGuru for mastering the Big Data Hadoop Online Training in Hyderabad
Lets know what is big data Hadoop?
It is a open source java based programming frame work that supports the processing and storage of extremely large data sets in a distributed computing environment. It was created by Doug Cutting and Mike Cafarella in 2006 to support distribution for the Nutch Search engine . Organizations can deploy Hadoop components and supporting software packages in their local data center . Hadoop is composed of numerous functional modules . At minimum level it uses Kernel to provide the frame work essential libraries.Other components include Hadoop Distributed file System (HDFS) which is capable of storing data across the thousand of commodity servers to achieve high Band width between nodes
The solution to the bulk of amount that we are expericing is Big Data Hadoop .The Hadoop architecture gives importance to Hadoop Yarn , Hadoop distributed File systems, Hadoop Common and hadoop Map Reduce. HDFS in Hadoop architecture provides high through put access to application data Map Reduce provides YARN based parallel processing of large data sets .
Hadoop supports a wide range of projects which can complement and extend Hadoop basic capabilities. Complementary Software packages include
it is the java based system created by Google where the data gets processed efficiently .It is responsible for breaking down the big data into smaller jobs .It is also responsible for for analyzing large data sets in parallel before reducing it . The working principle of operation behind Map Reduce is MAP job sends a query for processing to various nodes in a cluster where the reduce job collects all the results to output in a single value .
It is a convenient tool developed by YAHOO for analyzing huge data sets efficiently and Easily .The important feature of Pig is that their structure is open to considerable parallelization which makes easy to handle large data sets.
It is a tool used to transfer bulk amount data between Hadoop and Structured data Stores such as relational data bases .It can also be used for exporting data from Hadoop to other external data stores.It parallelized data transfer, allows imports , mitigates excessive loads , excessive loads efficient data analysis and copies data quickly.
It is tool uses to collect , aggregate and move huge amount of streaming data into HDFS. The processes that run the data flow with the flume are known as AGENTS and the data bits which flow via flume are known as Events.
It is developed by Facebook which is built on the top of Hadoop and provides simple language known as HiveQL which is similar to SQL for data summarization, querying and analysis.
It is a work Flow Scheduler where the work Flows are expressed as Directed Acyclic Graphs . It runs on Java servlet container Tomcat which makes use of data base to store all the running instances . The work Flows in Oozie are executed based on Data and time dependencies .
It is an open source configuration , synchronization and naming registry service for large distributed systems.It is responsible for Service synchronization , , distributed configuration service and providing a naming registry of Distributed systems.
It is open source column oriented data base which uses HDFS for underlying storing of data. With HBase NoSQL data base enterprise can create large tables with millions of rows and columns on Hard ware machine .
Recommended Audience :
Get in touch with OnlineITGuru for mastering the Big Data Hadoop Online Training in Bangalore