It is a tool / Platform, generally used with Hadoop to analyze larger sets of data representation. It was developed by yahoo in the year 2006. It undergo various releases and the latest version is 0.17 which was released in June – 2017. All the data manipulations in Hadoop is done suing Apache Pig. In data analysis program, PIG contains a high level language known as PIG Latin. programmers need to write scripts using PIG Latin for data analyation using PIG . The Scripts written in PIG Latin are internally converted to MAP and Reduce Tasks. This Apache Pig contains a component known as PIG Engine . It accepts PIG Latin as a Input and convert those into Map Reduce Jobs. Pig enables data workers to write complex transformations without knowing the PRIOR knowledge on JAVA. PIG can invoke code in many languages like JAVA, JYthon and JRuby using its User Defined Functions (UDF’s).
Get more information at Big data Hadoop online Training .
PIG works with data from many sources, including structured, unstructured which stores the results into the Hadoop Data File System. It is part of Hadoop ecosystem technologies which includes Hive, HBase, Zookeeper and other utilities to fulfill the functionality gaps in the framework. The major advantage of Pig is it follows a multi Query approach which reduces the number of time the data to be scanned. It reduces the development time by almost 16 times.
To perform a particular task, programmers need to write script using the PIG Latin language and execute them through any of the execution mechanism. After the completion of execution these scripts go through a series of transformations to produce a desired output.
The pig has several components . The architecture of Pig is shown below. Let us discuss them in detail.
Parser : Initially PIG Scripts were handled by the Parser .It checks the syntax of the script , does type checking and other miscellaneous checks. The output of the Parser is DAG( Directed
Acylic Graphic) , which represents the Pig Latin statements and Logical operators.
Optimizer : The output in the Parser is passed to logical optimizer, which carries logical optimizations such as Push down and Projections
Compiler : The task of the compiler is to compile the logical plan into the series of Map Reduce Jobs
Execution Engine : The task of the execution engine is to submit the Map Reduce jobs to Hadoop in a Sorted order. Finally , these Map Reduce jobs are executed on Hadoop to produce the desired Results
Map Reduce : It usually splits the input data set into independent chuncks , which are processes by a map task in a completely parallel manner. This frame works takes of scheduling and monitoring the task and re- executes if the task fails.
UDF’s: It provides the facility to create User Defined Functions as like in other programming languages like JAVA and invoke them in PIG Scripts .
Extensiblity: With the existing operators, users can develop their own functions to read , process and write data .
Rich Set of operators : Operations like Join , Sort ,Filter etc.. can be performed using its rich set of operators .
Effective Handling : Pig handles all kinds of data , both structured and unstructured answer stores the results in HDFS.
In comparison to SQL, PIG has following Advantages
It declares Execution plans.
It uses lazy evaluation
It can store data at any point during Pipe Line.
It uses Extract , transform and Load.
Map Reduce tasks can be done easily using PIG Latin language.
For processing time sensitive data loads
For processing huge data resources such as web logs.
Get in touch with OnlineITGuru for mastering the Big Data Hadoop Online Course
There is nothing much prerequisite for learning Big Data Hadoop .Its good to have a knowledge on some OOPs Concepts . But it is not mandatory .Our Trainers will teach you if you don’t have a knowledge on those OOPs Concepts