Big Data Interview Questions and Answers-Oozie

What is Oozie ?
Oozie is a workflow scheduler for Hadoop Oozie allows a user to create Directed Acyclic Graphs of workflows and these can be ran in parallel and sequential in Hadoop.It can also run plain java classes, Pig workflows and interact with the HDFS .It can run jobs sequentially and in parallel.

2. Why use oozie instead of just cascading a jobs one after another?
Major Flexibility :Start ,stop ,re-run and suspend
Oozie allows us to restart from failure

3. How to make a workflow?
First make a Hadoop job and make sure that it works Make a jar out of classes and then make a workflow.xml file and copy all of the job configuration properties in to the xml file.
Input files
Output files
Input readers and writers
mappers and reducers
job specific arguments
job.properties

4. What are the properties that we have to mention in .Properties?
Name Node
Job Tracker
Oozie.wf.application.path
Lib Path
Jar Path

5. What is application pipeline in Oozie?
It is necessary to connect workflow jobs that run regularly, but at different time intervals. The outputs of multiple subsequent runs of a workflow become the input to the next workflow. Chaining together these workflows result it is referred as a data application pipeline.

6. What are the extra files we need when we run a Hive action in Oozie?
hive.hql
hive-site.xml

7. How to run Oozie?
$ oozie job -oozie http://172.20.95.107:11000(oozie server node)/oozie -config job.properties -run
This will give the job id.
To know the status: $ oozie job -oozie http://172.20.95.107:11000(oozie server node)/oozie -info <job id>

8. What are all the actions can be performed in Oozie?

Email Action
Hive Action
Shell Action
Ssh Action
Sqoop Action
Writing a custom Action Executor

9. How to specify oozie start ,end and error nodes?

<start to=“[NODE-­‐NAME]” />

<end name=“[NODE-­‐NAME]”/>

<error
<message>“[A custom message]”</message>
</error>

10. Why we use Fork and Join nodes of oozie?

— A fork node splits one path of execution into multiple concurrent paths of execution.
— A join node waits until every concurrent execution path of a previous fork node arrives to it.
— The fork and join nodes must be used in pairs. The join node assumes concurrent execution paths are children of the same fork node.
<fork name=”[FORK-NODE-NAME]”>
<path start=”[NODE-NAME]” />

<path start=”[NODE-NAME]” />
</fork>

<join name=”[JOIN-NODE-NAME]” to=”[NODE-NAME]” />

11. What is Decision Node in Oozie?

Decision Nodes are switch statements that will run different jobs based on the outcomes of an expression.

Drop Us A Query

Trending Courses
  • Python and Django Online Training
  • Python is a dynamic interrupted language which is used in wide varieties of applications. It is very interactive object oriented and high-level programming language.

  • Tableau Online Training
  • Tableau is a Software company that caters interactive data visualization products that provide Business Intelligence services. The company’s Head Quarters is in Seattle, USA.

  • MicroStrategy Online Training
  • Micro Strategy is one of the few independent and publicly trading Business Intelligence software provider in the market. The firm is operational in 27 Countries around the globe.

  • PEGA (PRPC) 7.2 Certification Online Training
  • Pega Systems Inc. is a Cambridge, Massachusetts based Software Company. It is known for developing software for Customer Relationship Management (CRM) and Business process Management (BPM).

  • Workday Online Training
  • Workday specialises in providing Human Capital Management, Financial Management and payroll in online domain.It is a major web based ERP software vendor.

  • Power BI Online Training
  • Power BI is business analytics service by Microsoft. With Power BI, end users can develop reports and dashboards without depending on IT staff or Database Administrator.


 

100% Secure Payments. All major credit & debit cards accepted.