-
Notifications
You must be signed in to change notification settings - Fork 178
Azkaban Job Type: Pig
Azkaban2 is a ground-up re-design of the old azkaban. One of the design goals is to make Azkaban robust and flexible. The job executors that actually run user jobs were in the way -- we had to upgrade the whole package for any changes in any job executor.
So in Azkaban2, the job executors are carved out to be plugin based. This way, we can add a lot of different job executor plugins as we want -- for hive, for pig, etc, and for different versions of them. We could also add job executors that work with different version of Hadoop without touching the core Azkaban2.
Here is an existing job type re-introduced in Azkaban2:
In large part, this is the same "pig" type that was in the old azkaban. The difference is mainly in security. For description of Hadoop delegation tokens, refer to "HadoopJava" type page.
In the old azkaban, the keytab information is handed to the user process: The pig job wrapper does the keytab based login and proxy as user to call pig main. It is obviously dangerous for enterprise cluster in LinkedIn.
In Azkaban2, pig jobs are not longer handed the keytab info. Rather, each pig job will be granted hadoop delegation tokens. Luckily for the users, there is no extra action required to use a pig job package that was working in old azkaban and put it to work with new Azkaban2. Plus, there are added settings to make pig jobs taken in more parameters.
One needs to specify job type to pig
type=pig
One must also tell azkaban where the pig script is:
pig.script=WHERE_YOUR_PIG_SCRIPT_ON_AZKABAN_MACHINE
One runs pig on a hadoop cluster, therefore one needs
user.to.proxy=YOU_HADOOP_USER_NAME
The proxy user needs to be added as one of project permission in permissions page. Azkaban2 makes sure it doesn't request delegation tokens for any one on anyone's behalf.
pig.additional.jars=PRE_REGISTER_YOUR_UDF_JARS
udf.import.list=IMPORT_YOUR_UDF_NAME_SPACE
param.YOUR_PIG_PARAMS_NAME=YOUR_PIG_PARAMS
This is equivalent to the "-param " in pig command line.
param_file=YOUR_PIG_PARAM_FILE
Additionally, pig job type is based on JavaProcessJob class and supports settings such as jvm.args, classpath, etc.
see plugins/jobtype/examples/pig-wc
do zip wordcountpig.zip ./* -r to create the zip package and upload to azkaban