Friday, February 27, 2015

Hadoop Issues and Solution


Issue:
The problem occurs on the “CREATE TABLE trucks STORED AS ORC AS SELECT * FROM trucks_stage;”

Error Message:
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1448713152798_0002_2_00, diagnostics=[Task failed, taskId=task_1448713152798_0002_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space”

might be issue with insufficient Java heap space and try with below script:

CREATE TABLE trucks STORED AS ORC TBLPROPERTIES (“orc.compress.size”=”1024”) AS SELECT * FROM trucks_stage;
---------------------------------

Hue is not allowing to run multiple scripts/Concurrence

Expected state FINISHED, but found ERROR"

Error Message:
ERROR : Failed to execute tez graph.
org.apache.hadoop.hive.ql.metadata.HiveException: Default queue should always be returned.Hence we should not be here.
at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:251)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:260)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:199)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:116)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:999)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Solution:

Our findings:

This issue is happening when we run script from Hue only.

When running Hive queries through Hue (Beeswax), users are unable to run multiple queries concurrently. In practice, this doesn't matter if it is separate browser sessions, separate clients, etc. it seems to be tied to the user.
In looking at the way Tez works and looking through the code for the patch in Hive 0.14 that supports concurrent queries in general with Tez, it does not support parallel queries in a particular TezSession, only serial queries. This is also documented in Tez documentation. It seems the way that Hive creates a session is based upon the user.  Upon further digging, we found a ticket HIVE-9223 that is in open state which describes this issue.


-------------------------------
Ambari1.7 throwing an error while re-start any services from ambari

Error message:
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 80,333,492 milliseconds ago.  The last packet sent successfully to the server was 80,333,492 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
Error Code: 0

Solution:

1. stop ambari-server. 
# ambari-server stop 
2. Backup the ambari-server.jar: 
# mv /usr/lib/ambari-server/ambari-server-1.7.0.169.jar /tmp/ 
3. copy this ambari-server-1.7.0-9999.jar to /usr/lib/ambari-server/ 
4. Restart ambari-server 
# ambari-server start 

--------------------------------------------------
Getting Below Error while running hive script

Status: Killed 
Job received Kill while in RUNNING state. 
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1424221594778_0609_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed due to user-initiated job kill. failedTasks:0, Vertex vertex_1424221594778_0609_1_02 [Reducer 2] killed/failed due to:null]
DAG killed due to user-initiated kill. failedVertices:0 killedVertices:1 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

Solution:
We observed two problems here
Problem 1: The resultant data set is very large due to Cartesian join.
Problem 2: NULL / Blank values in JOIN keys

We tuned the script while adding 'where id is not null' condition and it's ran successfully

select 
from 
abc.tmp1 a
left outer join
(select * from app2 
where id is not null) b
on a.id=b.id
limit 10;
----------------------------------------------
Hive cli is throwing Warning Message and delay around 12 to 15 sec to get hive cli prompt.
This is defect is Ambari 1.7 and Vendor confirmed that it will fix with Ambari2.0
[root@ive]# hive
15/02/27 16:18:21 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
15/02/27 16:18:21 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist
15/02/27 16:18:21 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
15/02/27 16:18:21 WARN conf.HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
15/02/27 16:18:21 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

12 to 15 see delay is acceptable with this warning messages if it's delaying more it might be network issue.

Run below command and check logs for time delay 
hive --hiveconf hive.root.logger=DEBUG,console 
--------------------------------

Hive has problem conencting with HDP2.2
Error Mesasge:
Job Submission failed with exception 'java.io.FileNotFoundException(File file:/usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-*.jar does not exist)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Sol: update the below path from Ambari in hive-env.sh (cat /etc/hive/conf/hive-env.sh)

cat /etc/hive/conf/hive-env.sh
export HIVE_AUX_JARS_PATH=/usr/lib/hcatalog/share/hcatalog/hcatalog-core.jar
export HIVE_AUX_JARS_PATH=/usr/hdp/2.2.0.0-2041/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar
-------------------------------------
Non-dfs is increasing when hive job fails:
add following property in Core-site.xml -> fs.df.interval =60 sec