Click to rate this post!
[Total: 0 Average: 0]

Big Data Interview Questions and Answers-Pig

1). What is PIG?

Pig is a scripting platform that allows users to write MapReduce operations using a scripting language called Pig Latin. Apache Pig is a platform for analyzing large data sets Pig Scripts are converted into MapReduce Jobs which runs on data stored in HDFS.

2) What are all the pig elements?

Pig consists of three elements –
Pig Latin
High level scripting language
No Schema
Translated to MapReduce Jobs
Pig Grunt Shell
Interactive shell for executing pig commands.
Piggy Bank
Shared repository for User defined functions.

3). What are the datatypes in pig?

data type
int -4bytes,
float -4bytes,
double -8bytes,
long -8bytes,

4).What are the complex data types in pig?

Tuple—->(12.5,hello world,-2)
A tuple is an ordered set of fields. It’s represented by fields separated by commas, all enclosed by parentheses.
Bag—–>{(12.5,hello world,-2),(2.87,bye world,10)}
A bag is an unordered collection of tuples. A bag is represented by tuples separated by commas, all enclosed by curly brackets. Tuples in a bag aren’t required to have the same schema or even have the same number of fields.
A map is a set of key/value pairs. Keys must be unique and be a string(chararray). The value can be any type.

5). Whether pig latin language is  case-sensitive or not?

pig latin is some times not a case sensitive.let us see example,Load is equivalent to load.
A=load ‘b’ is not equivalent to a=load ‘b’
UDF are also case sensitive,count is not equivalent to COUNT.

6). How should ‘load’ keyword is useful in pig scripts?

Load looks for your data on HDFS in a tab-delimited file using the default load function ‘PigStorage’.suppose if we want to load data from hbase,we would use the loader for hbase ‘HbaseStorage’.

example of pigstorage loader

A = LOAD ‘/user/731419/pig/pigtest.txt’ using PigStorage (‘,’) AS (a:chararray, code:chararray, city:chararray, b:int);

example of hbasestorage loader

x= load ‘a’ using HbaseStorage();

if dont specify any loader function,it will takes built in function is ‘PigStorage’ the ‘load’ statement can also have ‘as’ keyword for creating schema,which allows you to specify the schema of the data you are loading.

7). What is the purpose of ‘dump’ keyword in pig?

dump display the output on the screen
dump A;

8).what are relational operations in pig latin?

They are
a)for each
b)order by

9). How to use ‘foreach’ operation in pig scripts?

Each takes a set of expressions and applies them to every record in the data
A = load ‘input’ as (user:chararray, id:long, address:chararray);
B = foreach A generate user, id;
positional references are preceded by a $ (dollar sign) and start from 0:
c=foreach A generate $0,$1;

10). why should we use ‘filters’ in pig scripts?

Selects tuples from a relation based on some condition.
Syntax: alias = FILTER alias BY expression;
FILTER is commonly used to select the data that you want; or, conversely, to filter out (remove) the data you don’t want.

11).why should we use ‘orderby’ keyword in pig scripts?

The order statement sorts your data for you, producing a total order of your output data.The syntax of order is similar to group. You indicate a key or set of keys by which you wish to order your data.
input2 = load ‘daily’ as (exchanges, stocks);
grpds = order input2 by exchanges;

12) Use of Distinct

Definition: Removes duplicate tuples in a relation.
Syntax: alias = DISTINCT alias

13). Is it possible to display the limited no of results?

yes,Sometimes you want to see only a limited number of results. ‘limit’ allows you do this:
input2 = load ‘daily’ as (exchanges, stocks);
first10 = limit input2 10;

14). Difference Between Pig and SQL ?

Pig is a Procedural SQL is Declarative
OLAP works SQL supports OLAP+OLTP works loads

15). What Is Difference Between Mapreduce and Pig ?

In MR Need to write entire logic for operations like join,group,filter,sum etc ..
In Pig Bulit in functions are available
MR Number of lines of code required is too much even for a simple functionality
Pig 10 lines of pig latin equal to 200 lines of java
In MR Time of effort in coding is high
In Pig What took 4hrs to write in java took 15 mins in pig latin (approx)

16). When should you use Pig Latin and when should you use Hive?

When data is purely structured, we should go for Hive and in case of semi-structured we can consider Pig.

17). What is a relation in Pig?

A Pig relation is a bag of tuples. A Pig relation is similar to a table in a relational database, where the tuples in the bag correspond to the rows in a table.

18). Why do we need MapReduce during Pig programming?

Pig is a high-level platform that makes many Hadoop data analysis issues easier to execute. The language we use for this platform is: Pig Latin. A program written in Pig Latin is like a query written in SQL, where we need an execution engine to execute the query. So, when a program is written in Pig Latin, Pig compiler will convert the program into MapReduce jobs. Here, MapReduce acts as the execution engine.

19). Does Pig give any warning when there is a type mismatch or missing field?

No, Pig will not show any warning if there is no matching field or a mismatch. If you assume that Pig gives such a warning, then it is difficult to find in log file.

20). Difference between pig and hive?

Pig Hive
1.used for data analysts Used for Researchers and programmers
2.semi structured strucured
3.PigLatin HiveQL
4.Hive component is mainly for creating Reports Pig for programming
5.You can join, order and sort data dynamically in an aggregated manner with Hive Pig also provides you an additional COGROUP feature for performing outer joins

Get more questions and answers from onlineitguru trainers after completion of Big data online course.

to our newsletter

Drop Us A Query

Trending Courses
  • oracle 12c rac | OnlineITGuru
    Oracle RAC Training
  • Oracle is the large vendor in providing the various storge services to the people across the globe. This vendor provides a different amount of storage services to the people across the globe.

  • salesforce lightning training | OnlineITGuru
    Salesforce Lightning Training
  • Developing an application is not a simple and easy task. There are various parameters that the web developer need to take care while developing an application. One of those parameters that the developer needs to take care of is the code reusability.

  • Selenium with python
    Selenium with Python Training
  • As we know, that Selenium with Python Web Browser Selenium Automation is Gaining Popularity Day by Day. So many Frameworks and Tools Have arisen to get Services to Developers.

  • machine learning with python
    Machine Learning with Python Training
  • Over last few years, Big Data and analysis have come up, with Exponential and modified Direction of Business. That operate Python, emerged with a fast and strong Contender for going with Predictive Analysis.

  • Data science with R
    Data Science With R Training
  • Understanding and using Linear, non-linear regression Models and Classifying techniques for stats analysis. Hypothesis testing sample methods, to get business decisions.

  • data science with python
    Data Science with Python Training
  • Everyone starts Somewhere, first you learn basics of Every Scripting concept. Here you need complete Introduction to Data Science python libraries Concepts.


100% Secure Payments. All major credit & debit cards accepted.

Call Now Button