![Explain about Big Data and Hadoop?](https://onlineitguru.com/images/blog-default.png)
As days pass on data is increasing tremendously. To handle such huge amount of data traditional data bases is not suitable . At that moment Big data comes into existence.
Big data refers to the data sets that are large and inadequate which the traditional data processing application software in inadequate to deal with them. This term has been in use since 1990’s . For example It challenges include data storage, data analysis , querying, updating and information privacy.
Get in touch with OnlineITGuru for mastering the Big Data Hadoop Online Training
Lets know what is big data Hadoop?
It is a open source java based programming frame work that supports the processing and storage of extremely large data sets in a distributed computing environment. It created by Doug Cutting and Mike Cafarella in 2006 to support distribution for the Nutch Search engine . Organizations can deploy Hadoop components and supporting software packages in their local data center . Hadoop composed of numerous functional modules . At minimum level it uses Kernel to provide the frame work essential libraries. Other components include Hadoop Distributed file System (HDFS) which capable of storing data across the thousand of commodity servers to achieve high Band width between nodes
Architecture :
The solution to the bulk of amount that we expericing Big Data Hadoop .The Hadoop architecture gives importance to Hadoop Yarn , Hadoop distributed File systems, Hadoop Common and hadoop Map Reduce. As a matter of fact HDFS in Hadoop architecture provides high through put access to application data Map Reduce provides YARN based parallel processing of large data sets .
Generally Hadoop supports a wide range of projects which can complement and extend Hadoop basic capabilities. Complementary Software packages includeMap Reduce :
it is the java based system created by Google where the data gets processed efficiently . It responsible for breaking down the big data into smaller jobs .It is also responsible for for analyzing large data sets in parallel before reducing it . The working principle of operation behind Map Reduce is MAP job sends a query for processing to various nodes in a cluster where the reduce job collects all the results to output in a single value .
Apche Pig :
It is a convenient tool developed by YAHOO for analyzing huge data sets efficiently and Easily . As a matter of fact the important feature of Pig that their structure open to considerable parallelization which makes easy to handle large data sets.
Apache Sqoop :
It is a tool used to transfer bulk amount data between Hadoop and Structured data Stores such as relational data bases . However It used for exporting data from Hadoop to other external data stores. Specifically It parallelized data transfer, allows imports , mitigates excessive loads , excessive loads efficient data analysis and copies data quickly.
Apache Flume :
It is tool uses to collect , aggregate and move huge amount of streaming data into HDFS. Equally Important the processes that run the data flow with the flume known as AGENTS and the data bits which flow via flume known as Events.
Apache Hive :
It developed by Facebook which built on the top of Hadoop and provides simple language known as HiveQL which is similar to SQL for data summarization, querying and analysis.
Apache Oozie:
As an illustration it is a work Flow Scheduler where the work Flows expressed as Directed Acyclic Graphs . Finally It runs on Java servlet container Tomcat which makes use of data base to store all the running instances . Therefore work Flows in Oozie executed based on Data and time dependencies .
Apache Zookeper :
Specifically It is an open source configuration , synchronization and naming registry service for large distributed systems. Especially It responsible for Service synchronization, distributed configuration service and providing a naming registry of Distributed systems.
Apache HBase :
As a result, Open source column oriented data base uses HDFS for underlying storing of data. With HBase NoSQL data base enterprise can create large tables with millions of rows and columns on Hard ware machine .
Recommended Audience :Software developersETL developersProject ManagersTeam Lead’sPrerequisites:- In order to start learning Big Data has no prior requirement to have knowledge on any technology required to learn Big Data Hadoop and also need to have some basic knowledge on java concept.
- Its good to have a knowledge on Oops concepts and Linux Commands.