Tuesday, January 20, 2015

Oozie, Importing data from Teradata using sqoop and insert data into hive using Oozie



Insert data into hive using Oozie:

Give the file names appropriately like below.

1) Script Name : /user/hue/oozie/workspaces/_hdfs_-oozie-30-1421959133.15/test.hql
2) Files : /user/hue/oozie/workspaces/_hdfs_-oozie-30-1421959133.15/tez-site.xml
3) Job XML : /user/hue/oozie/workspaces/_hdfs_-oozie-30-1421959133.15/hive-site.xml

your oozie/workspace directory looks like:
[hdfs@xxxx~]$ hadoop fs -ls /user/hue/oozie/workspaces/_hdfs_-oozie-30-1421959133.15/
Found 4 items
-rw-r--r--   2 hdfs hue       /user/hue/oozie/workspaces/_hdfs_-oozie-30-1421959133.15/hive-site.xml
-rw-r--r--   2 hdfs hue       /user/hue/oozie/workspaces/_hdfs_-oozie-30-1421959133.15/test.hql
-rw-r--r--   2 hdfs hue       user/hue/oozie/workspaces/_hdfs_-oozie-30-1421959133.15/tez-site.xml
-rw-r--r--   3 hdfs hue       user/hue/oozie/workspaces/_hdfs_-oozie-30-1421959133.15/workflow.xml
[hdfs@xxxxxx ~]$

Note :If you are getting an error related to DB please check the my-sql-connector jar and that needs to be place in /user/oozie/share/lib/hive

Error Message :
E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate hdfs

Sol:
hadoop.proxyuser.oozie.hosts - this should be set to the FQDN of the machine running your oozie service.
and
hadoop.proxyuser.oozie.groups - this should be set to *
--------------------------------------------------------------------
Error Message:
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion

Sol:
I assume if you are using Tez you are trying to run a Hive query. You should include your tez-site.xml in your Oozie workflow directory and make sure you are mentioning the tez-site.xml in a <file> element in your workflow.xml. See 

http://oozie.apache.org/docs/3.3.1/DG_HiveActionExtension.html 

for further explanation of how to use <file>, but basically you would put the tez-site.xml in the root of your workflow directory and then specify the file as a child element of the <hive> element like this: 

<hive ...> 
<configuration> ... 
</configuration> 
<param>...</param> 
<file>tez-site.xml</file> 
</hive> 


Please note from the XML schema of a hive action that order is important. The <file> element should go after any <configuration> or <param> elements in your XML.
----------------------------------------------------

Sample Workflow Screen from Oozie:















-------------------------------------------------------------------------------------------

Importing data into HDFS/hive from Teradata using Oozie and Sqoop

This is very trick to work it out and we need to take few necessary steps before going to execute the steps.

1) we need to install sqoop in all the Node Manager nodes ( means typically in all the data nodes)
2) Place Teradata drivers in all the nodes where we installed sqoop.
3) we need to create lib directory under oozie/workspace and needs to place all the teradata drivers like below.

[hdfs@xxx ~]$ hadoop fs -ls /user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/
drwxr-xr-x   - hdfs hue  user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/lib
-rw-r--r--   3 hdfs hdfs   user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/workflow.xml
[hdfs@adcp22nxhwx13 ~]$

Teradata jars needs to beplace in lib under Oozie/workspaces directory.

[hdfs@xxx ~]$ hadoop fs -ls /user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/lib/
/user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/lib/hortonworks-teradata-connector.jar
/user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/lib/opencsv-2.3.jar
/user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/lib/tdgssconfig.jar
/user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/lib/teradata-connector-1.3.2-hadoop210.jar
/user/hue/oozie/workspaces/_hdfs_-oozie-24-1421896024.01/lib/terajdbc4.jar
[hdfs@adcp22nxhwx13 ~]$


Running Ooozie job with jar file with Class:

Main class [org.apache.oozie.action.hadoop.JavaMain], exit code [1]
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [com.adt.Explode], exit code [1]





















Note : Don't give any sqoop command in command field, leave it empty and you can add argument if you wish to add any arguments.

Sample Sqoop Command to import data from teradata.
sqoop import \
--connect jdbc:teradata://<Teradata ip address>/TD Schema \
--connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username XXXXXX \
--password YYYYYY \
--query "SELECT * FROM xyz where date >= '2014-04-01' and date < '2014-05-01' AND \$CONDITIONS" \
--target-dir /abc/ \
--split-by  ID \
--fields-terminated-by '|' \
--m 1;

Note : Sqoop user should have an access in Teradata and with select access.
-------------------------------------------------------------------------------------------------------


E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate hdfs

Sol:
In Ambari under the HDFS configs, you will find a section for "Custom core-site.xml". In there can you check if you have the following properties set:

hadoop.proxyuser.oozie.hosts - this should be set to the FQDN of the machine running your oozie service.

and
hadoop.proxyuser.oozie.groups - this should be set to *
After you change these settings you will need to restart your cluster.

----------------------------------------------------------------
E0701: XML schema error, cvc-pattern-valid: Value 'mem.annotation.tmp.remove' is not facet-valid with respect to pattern '([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39}' for type 'IDENTIFIER'.

Sol:
I have researched the issue and found the following reason as to why job would fail. The use of a (dot/period) is not a permitted character link to the oozie guide that can reference for a list of permitted characters. 

http://oozie.apache.org/docs/4.0.1/WorkflowFunctionalSpec.html 

More specifically, Appendix A in the above link gives a list of acceptable characters. I will post below as well: 

Appendix A, Oozie XML-Schema 

Oozie Schema Version 0.5 

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:workflow="uri:oozie:workflow:0.5" 
elementFormDefault="qualified" targetNamespace="uri:oozie:workflow:0.5"> <xs:element name="workflow-app" type="workflow:WORKFLOW-APP"/> 
<xs:simpleType name="IDENTIFIER"> 
<xs:restriction base="xs:string"> 
<xs:pattern value="([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39}"/> 
</xs:restriction> 

The acceptable characters are listed as: 

<xs:pattern value="([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39}"/> 

Oozie has not been coded to allow the use of the period or dot.

------------------------------------------------------------------------------------

Possible Error Messages:
"/DATA/sdj1/hadoop/yarn/local/usercache/hdfs/appcache/application_1421791018931_0011/container_1421791018931_0011_01_000002"): error=2, No such file or directory
E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [1]

No comments:

Post a Comment