Thursday, March 5, 2015

Oozie Part 1 : Appache Oozie


Apache™ Oozie is a Java Web application used to schedule Apache Hadoop jobs.Oozie combines multiple jobs sequentially into one logical unit of work. It is integrated with the Hadoop stack and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop. It can also be used to schedule jobs specific to a system, like Java programs or shell scripts.


1) Running Python Scripts from Oozie.


Errors:
-- empty --sorts idle data so that sequential jobs could be run on it
package used is essentially numpy which basically sorts the data by truck and time.s
: command not found
./sort.py: line 8: import: command not found
./sort.py: line 9: import: command not found
./sort.py: line 10: import: command not found
./sort.py: line 14: syntax error near unexpected token `'pipes','
./sort.py: line 14: `csv.register_dialect('pipes', delimiter='|')'
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Sol: Install numpy rpm in all the nodes.
       clush -g all yum -y install numpy


IOError: [Errno 2] No such file or directory: '/hdfs/diva/dataout/'

Reference links:
https://github.com/yahoo/oozie/wiki/Oozie-WF-use-cases

No comments:

Post a Comment