Log In to start Learning

Login via

Post By Admin Last Updated At 2020-06-15

Big Data Interview Questions and Answers-Pig

1). What is PIG?

Pig is a scripting platform that allows users to write MapReduce operations using a scripting language called Pig Latin. Apache Pig is a platform for analyzing large data sets Pig Scripts are converted into MapReduce Jobs which runs on data stored in HDFS.

2) What are all the pig elements?
Pig consists of three elements -Pig LatinHigh level scripting languageNo SchemaTranslated to MapReduce JobsPig Grunt ShellInteractive shell for executing pig commands.Piggy BankShared repository for User defined functions.
3). What are the datatypes in pig?
Ans:data typeint -4bytes,float -4bytes,double -8bytes,long -8bytes,chararray,bytearray
4).What are the complex data types in pig?
Tuple---->(12.5,hello world,-2)A tuple is an ordered set of fields. It’s represented by fields separated by commas, all enclosed by parentheses.Bag----->{(12.5,hello world,-2),(2.87,bye world,10)}A bag is an unordered collection of tuples. A bag is represented by tuples separated by commas, all enclosed by curly brackets. Tuples in a bag aren’t required to have the same schema or even have the same number of fields.Map----->[key#value]A map is a set of key/value pairs. Keys must be unique and be a string(chararray). The value can be any type.
5). Whether pig latin language is  case-sensitive or not?
pig latin is some times not a case sensitive.let us see example,Load is equivalent to load.A=load ‘b’ is not equivalent to a=load ‘b’UDF are also case sensitive,count is not equivalent to COUNT.
6). How should ‘load’ keyword is useful in pig scripts?

Load looks for your data on HDFS in a tab-delimited file using the default load function ‘PigStorage’.suppose if we want to load data from hbase,we would use the loader for hbase ‘HbaseStorage’.

example of pigstorage loader

A = LOAD ‘/user/731419/pig/pigtest.txt’ using PigStorage (‘,') AS (a:chararray, code:chararray, city:chararray, b:int);

example of hbasestorage loaderx= load ‘a’ using HbaseStorage();

if dont specify any loader function,it will takes built in function is ‘PigStorage’ the ‘load’ statement can also have ‘as’ keyword for creating schema,which allows you to specify the schema of the data you are loading.

7). What is the purpose of ‘dump’ keyword in pig?
dump display the output on the screendump A;
8).what are relational operations in pig latin?
They area)for eachb)order byc)filtersd)groupe)distinctf)joing)limit
9). How to use ‘foreach’ operation in pig scripts?
Each takes a set of expressions and applies them to every record in the dataA = load ‘input’ as (user:chararray, id:long, address:chararray);B = foreach A generate user, id;positional references are preceded by a $ (dollar sign) and start from 0:c=foreach A generate $0,$1;
10). why should we use ‘filters’ in pig scripts?
Selects tuples from a relation based on some condition.Syntax: alias = FILTER alias BY expression;FILTER is commonly used to select the data that you want; or, conversely, to filter out (remove) the data you don’t want.
11).why should we use ‘orderby’ keyword in pig scripts?

The order statement sorts your data for you, producing a total order of your output data.The syntax of order is similar to group. You indicate a key or set of keys by which you wish to order your data.input2 = load ‘daily’ as (exchanges, stocks);grpds = order input2 by exchanges;

12) Use of Distinct

Definition: Removes duplicate tuples in a relation.Syntax: alias = DISTINCT alias

13). Is it possible to display the limited no of results?

yes,Sometimes you want to see only a limited number of results. ‘limit’ allows you do this:input2 = load ‘daily’ as (exchanges, stocks);first10 = limit input2 10;

14). Difference Between Pig and SQL ?
Pig SQLPig is a Procedural SQL is DeclarativeOLAP works SQL supports OLAP+OLTP works loads
15). What Is Difference Between Mapreduce and Pig ?
In MR Need to write entire logic for operations like join,group,filter,sum etc ..In Pig Bulit in functions are availableMR Number of lines of code required is too much even for a simple functionalityPig 10 lines of pig latin equal to 200 lines of javaIn MR Time of effort in coding is highIn Pig What took 4hrs to write in java took 15 mins in pig latin (approx)
16). When should you use Pig Latin and when should you use Hive?
When data is purely structured, we should go for Hive and in case of semi-structured we can consider Pig.
17). What is a relation in Pig?
A Pig relation is a bag of tuples. A Pig relation is similar to a table in a relational database, where the tuples in the bag correspond to the rows in a table.
18). Why do we need MapReduce during Pig programming?

Pig is a high-level platform that makes many Hadoop data analysis issues easier to execute. The language we use for this platform is: Pig Latin. A program written in Pig Latin is like a query written in SQL, where we need an execution engine to execute the query. So, when a program is written in Pig Latin, Pig compiler will convert the program into MapReduce jobs. Here, MapReduce acts as the execution engine.

19). Does Pig give any warning when there is a type mismatch or missing field?
No, Pig will not show any warning if there is no matching field or a mismatch. If you assume that Pig gives such a warning, then it is difficult to find in log file.
20). Difference between pig and hive?
Pig Hive1.used for data analysts Used for Researchers and programmers2.semi structured strucured3.PigLatin HiveQL4.Hive component is mainly for creating Reports Pig for programming5.You can join, order and sort data dynamically in an aggregated manner with Hive Pig also provides you an additional COGROUP feature for performing outer joinsGet more questions and answers from onlineitguru trainers after completion of Big data online course.