Big Data Interview Questions and Answers-Pig

1).What is PIG?
Pig is a scripting platform that allows users to write MapReduce operations using a scripting language called Pig Latin. Apache Pig is a platform for analyzing large data sets Pig Scripts are converted into MapReduce Jobs which runs on data stored in HDFS.

2)What are all the pig elements?
Pig consists of three elements –
Pig Latin
High level scripting language
No Schema
Translated to MapReduce Jobs
Pig Grunt Shell
Interactive shell for executing pig commands.
PiggyBank
Shared repository for User defined functions.

3).What are the datatypes in pig?
Ans:
data type
int -4bytes,
float -4bytes,
double -8bytes,
long -8bytes,
chararray,
bytearray

4).What are the complex data types in pig?
Tuple—->(12.5,hello world,-2)
A tuple is an ordered set of fields. It’s represented by fields separated by commas, all enclosed by parentheses.
Bag—–>{(12.5,hello world,-2),(2.87,bye world,10)}
A bag is an unordered collection of tuples. A bag is represented by tuples separated by commas, all enclosed by curly brackets. Tuples in a bag aren’t required to have the same schema or even have the same number of fields.
Map—–>[key#value]
A map is a set of key/value pairs. Keys must be unique and be a string(chararray). The value can be any type.

5).Whether pig latin language is  case-sensitive or not?
pig latin is some times not a case sensitive.let us see example,Load is equivalent to load.
A=load ‘b’ is not equivalent to a=load ‘b’
UDF are also case sensitive,count is not equivalent to COUNT.

6).How should ‘load’ keyword is useful in pig scripts?
Load looks for your data on HDFS in a tab-delimited file using the default load function ‘PigStorage’.suppose if we want to load data from hbase,we would use the loader for hbase ‘HbaseStorage’.

example of pigstorage loader

A = LOAD ‘/user/731419/pig/pigtest.txt’ using PigStorage (‘,’) AS (a:chararray, code:chararray, city:chararray, b:int);

example of hbasestorage loader

x= load ‘a’ using HbaseStorage();

if dont specify any loader function,it will takes built in function is ‘PigStorage’ the ‘load’ statement can also have ‘as’ keyword for creating schema,which allows you to specify the schema of the data you are loading.

7).What is the purpose of ‘dump’ keyword in pig?
dump display the output on the screen
dump A;

8).what are relational operations in pig latin?
They are
a)for each
b)order by
c)filters
d)group
e)distinct
f)join
g)limit

9).How to use ‘foreach’ operation in pig scripts?
Each takes a set of expressions and applies them to every record in the data
A = load ‘input’ as (user:chararray, id:long, address:chararray);
B = foreach A generate user, id;
positional references are preceded by a $ (dollar sign) and start from 0:
c=foreach A generate $0,$1;

10).why should we use ‘filters’ in pig scripts?
Selects tuples from a relation based on some condition.
Syntax: alias = FILTER alias BY expression;
FILTER is commonly used to select the data that you want; or, conversely, to filter out (remove) the data you don’t want.

11).why should we use ‘orderby’ keyword in pig scripts?
The order statement sorts your data for you, producing a total order of your output data.The syntax of order is similar to group. You indicate a key or set of keys by which you wish to order your data.
input2 = load ‘daily’ as (exchanges, stocks);
grpds = order input2 by exchanges;
12).Use of Distinct
Definition: Removes duplicate tuples in a relation.
Syntax: alias = DISTINCT alias

13).is it possible to display the limited no of results?
yes,Sometimes you want to see only a limited number of results. ‘limit’ allows you do this:
input2 = load ‘daily’ as (exchanges, stocks);
first10 = limit input2 10;

14).Difference Between Pig and SQL ?

Pig SQL
Pig is a Procedural SQL is Declarative
Schema is optional SQL schema is required
OLAP works SQL supports OLAP+OLTP works loads

15).What Is Difference Between Mapreduce and Pig ?
In MR Need to write entire logic for operations like join,group,filter,sum etc ..
In Pig Bulit in functions are available
In MR Number of lines of code required is too much even for a simple functionality
In Pig 10 lines of pig latin equal to 200 lines of java
In MR Time of effort in coding is high
In Pig What took 4hrs to write in java took 15 mins in pig latin (approx)

16).When should you use Pig Latin and when should you use Hive?
When data is purely structured, we should go for Hive and in case of semi-structured we can consider Pig.
17).What is a relation in Pig?
A Pig relation is a bag of tuples. A Pig relation is similar to a table in a relational database, where the tuples in the bag correspond to the rows in a table.

18).Why do we need MapReduce during Pig programming?
Pig is a high-level platform that makes many Hadoop data analysis issues easier to execute. The language we use for this platform is: Pig Latin. A program written in Pig Latin is like a query written in SQL, where we need an execution engine to execute the query. So, when a program is written in Pig Latin, Pig compiler will convert the program into MapReduce jobs. Here, MapReduce acts as the execution engine.

19).Does Pig give any warning when there is a type mismatch or missing field?
No, Pig will not show any warning if there is no matching field or a mismatch. If you assume that Pig gives such a warning, then it is difficult to find in log file. If any mismatch is found, it assumes a null value in Pig.

20).Difference between pig and hive?
Pig Hive
1.used for data analysts Used for Researchers and programmers
2.semi structured strucured
3.PigLatin HiveQL
4.Hive component is mainly for creating Reports Pig for programming
5.You can join, order and sort data dynamically in an aggregated manner with Hive Pig also provides you an additional COGROUP feature for performing outer joins

Also Refer:http://www.larsgeorge.com/2009/10/hive-vs-pig.html

Drop Us A Query

Trending Courses
  • Python and Django Online Training
  • Python is a dynamic interrupted language which is used in wide varieties of applications. It is very interactive object oriented and high-level programming language.

  • Tableau Online Training
  • Tableau is a Software company that caters interactive data visualization products that provide Business Intelligence services. The company’s Head Quarters is in Seattle, USA.

  • MicroStrategy Online Training
  • Micro Strategy is one of the few independent and publicly trading Business Intelligence software provider in the market. The firm is operational in 27 Countries around the globe.

  • PEGA (PRPC) 7.2 Certification Online Training
  • Pega Systems Inc. is a Cambridge, Massachusetts based Software Company. It is known for developing software for Customer Relationship Management (CRM) and Business process Management (BPM).

  • Workday Online Training
  • Workday specialises in providing Human Capital Management, Financial Management and payroll in online domain.It is a major web based ERP software vendor.

  • Power BI Online Training
  • Power BI is business analytics service by Microsoft. With Power BI, end users can develop reports and dashboards without depending on IT staff or Database Administrator.


 

100% Secure Payments. All major credit & debit cards accepted.