Apache Oozie is an open source Java Web application to schedule Apache Hadoop Jobs in a distributed environment. It allows combine multiple complex jobs to be run in a sequential order to achieve a bigger task. With the sequence of the task two or more jobs can be programmed to run parallel to each other. It is a work flow scheduler for Hadoop. It is integrated with Hadoop Stack with Hadoop Stack with Yarn as its Architectural Center and support jobs for Apache Map Reduce, Apache PIG, and Apache Sqoop. It schedules Jobs specific to system like Java Programs or shell Scripts. It combines multiple jobs sequentially into one logical unit of work. It is responsible for triggering the work Flow actions. This in turn uses the Hadoop execution engine to actually execute the task. Dependencies between jobs were specified by the user in the form of Directed A cyclic Graphs. Oozie takes care of their execution in the correct order as specified in the work flow . Oozie is scalable , which can manage timely execution of thousands of workflows in Hadoop cluster.
Learn more about Big Data Hadoop in this comprehensive Big Data Hadoop tutorial now.
It consists of two parts:
Work Flow engine: Its responsibility is to store and run workflows , composed of Hadoop Jobs Eg: Pig , hive and Map reduce.
Coordinator Engine: It runs work Flow jobs , based on the predefined schedules and availability of data.
Oozie is very much Flexible as well. One can easily start, stop and suspend jobs. Oozie has a flexibility of start, suspend, stop and rerun jobs . Oozie makes it very easy to rerun failed work flows . It is also possible to skip a specific failed node .
It runs as a service in the cluster and submit Work Flow definitions for immediate or later processing.
OOzie Work Flow consists CONTROL FLOW NODES and ACTION NODES
It defines the beginning and end of the work Flow to control the work Flow execution Path.
A control Flow node controls the execution of Work Flow between actions, by allowing constructs like Conditional Logic where different branches, depending on the result of the earlier action node
The action node performs the transfer of data, namely moving files into HDFS, PIG (or) Hive jobs , importing data Jobs using Sqoop or running a shell script.
There are three types of Jobs that are common in oozie
Oozie Work Flow Jobs: These are used to represent the sequence of jobs to be executed. It is represented as Directed Acyclic Graphs.
Oozie Coordinator jobs: These consists of work Flow jobs triggered by time and data availability.
Oozie bundle : It can be referred as a package of multiple coordinator and work Flow jobs .
With the Client API and command line Interface, OOzie can be used to launch , control and monitor the job to Java Application .
OOzie has provisions to execute jobs, which are scheduled to run periodically.
Using its Web service API’s, one can control jobs from anywhere.
Oozie has provision to send email notifications, after the completion of jobs.
Oozie is well integrated with Hadoop security. It is most important in the Kerberized Cluster. Oozie maintains a record of job submitted list containing which user submitted the job and launches all the actions with the proper privileges .It also handle all the authentications details for the user as well.
It is the only work flow manger with Built –in Hadoop actions which makes workflow development , maintenance and trouble shooting easier . In work Flow actions some systems would require significantly more work to correlate job tracker jobs .
Oozie UI makes it easier to dril down to specific errors in the data nodes .
Recommended Audience :
Prerequisites : There is nothing much prerequisite for learning Big Data Hadoop .Its good to have a knowledge on some OOPs Concepts . But it is not mandatory .Our Trainers will teach you if you don’t have a knowledge on those OOPs Concepts
Become a Master in APACHE Oozie from OnlineITGuru Experts through Big Data Hadoop online course Course .