It is a tool / Platform, generally used with Hadoop to analyze larger sets of data representation. Developed by yahoo in the year 2006. It undergo various releases and the latest version is 0.17 which was released in June – 2017. All the data manipulations in Hadoop is done suing Apache Pig. In data analysis program, PIG contains a high level language known as PIG Latin. programmers need to write scripts using PIG Latin for data analyation using PIG . The Scripts written in PIG Latin internally converted to MAP and Reduce Tasks. This Apache Pig contains a component known as PIG Engine . It accepts PIG Latin as a Input and convert those into Map Reduce Jobs. Pig enables data workers to write complex transformations without knowing the PRIOR knowledge on JAVA. PIG can invoke code in many languages like JAVA, JYthon and JRuby using its User Defined Functions (UDF’s).
Get more information at Big data Hadoop online Training .
PIG works with data from many sources, including structured, unstructured which stores the results into the Hadoop Data File System. It is part of Hadoop ecosystem technologies which includes Hive, HBase, Zookeeper and other utilities to fulfill the functionality gaps in the framework. The major advantage of Pig it follows a multi Query approach which reduces the number of time the data to be scanned. It reduces the development time by almost 16 times.
To perform a particular task, programmers need to write script using the PIG Latin language and execute them through any of the execution mechanism. After the completion of execution these scripts go through a series of transformations to produce a desired output.
The pig has several components . The architecture of Pig shown below. Let us discuss them in detail.
Parser : Initially PIG Scripts handled by the Parser . As a matter of fact It checks the syntax of the script , does type checking and other miscellaneous checks. The output of the Parser DAG( Directed
Acylic Graphic) , which represents the Pig Latin statements and Logical operators.
Optimizer : To illustrate the output in the Parser passed to logical optimizer, which carries logical optimizations such as Push down and Projections
Compiler : In the same fashion the task of the compiler is to compile the logical plan into the series of Map Reduce Jobs
Execution Engine : To enumerate the task of the execution engine is to submit the Map Reduce jobs to Hadoop in a Sorted order. Finally , these Map Reduce jobs executed on Hadoop to produce the desired Results
Map Reduce : Especially It usually splits the input data set into independent chuncks , which are processes by a map task in a completely parallel manner. Simultaneously this frame works takes of scheduling and monitoring the task and re- executes if the task fails.
UDF’s: It provides the facility to create User Defined Functions as like in other programming languages like JAVA and invoke them in PIG Scripts .
Extensiblity: As a matter of fact With the existing operators, users can develop their own functions to read , for example process and write data .
Rich Set of operators : For example Operations like Join , Sort ,Filter etc.. performed using its rich set of operators .
Effective Handling : Generally, Pig handles all kinds of data , in the same fashion both structured and unstructured answer stores the results in HDFS.
In comparison to SQL, PIG has following Advantages
Similarly It declares Execution plans.
It uses lazy evaluation
Especially It can store data at any point during Pipe Line.
It uses Extract , transform and Load.
Map Reduce tasks done easily using PIG Latin language.
Specifically For processing time sensitive data loads
For processing huge data resources such as web logs.
Get in touch with OnlineITGuru for mastering the Big Data Hadoop Online Course
There is nothing much prerequisite for learning Big Data Hadoop .Its good to have a knowledge on some OOPs Concepts . But it is not mandatory .Our Trainers will teach you if you don’t have a knowledge on those OOPs Concepts