Friday, June 17, 2016

Tuesday, June 14, 2016

HBase Dev Part 1: HBase Development


1) Connect HBase Shell:
[danna@cloudglee01 ~]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.6.1.0-129, r718c773662346de98a8ce6fd3b5f64e279cb87d4, Wed May 31 03:27:31 UTC 2017
hbase(main):001:0>

2) Display HBase Shell Help Text:
Type help and press Enter, to display some basic usage information for HBase Shell, as well as
several example commands. Notice that table names, rows, columns all must be enclosed in
quote characters.
hbase (main):001:0> help

3) Create a table
Use the create command to create a new table. You must specify the table name and ColumnFamily name

hbase(main):008:0> create 'test','cf'
0 row(s) in 2.2870 seconds
=> Hbase::Table - test
hbase(main):009:0>

4) List information about your table
    Use the List Command

hbase(main):009:0> list
TABLE
test
1 row(s) in 0.0130 seconds
=> ["test"]
hbase(main):010:0>

5) Put data into your table
    To put data into your table use put command

hbase(main):010:0> put 'test','row1','cf:a','Value1'
0 row(s) in 0.0920 seconds
hbase(main):011:0> put 'test','row2','cf:b','Value2'
0 row(s) in 0.0180 seconds
hbase(main):012:0> put 'test','row3','cf:c','Value3'
0 row(s) in 0.0090 seconds
hbase(main):011:0>

Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value
of value1. Columns in HBase are comprised of a column family prefix, cf in this example,
followed by a colon and then a column qualifier suffix, a in this case.

6) Scan the for all the data at once
 One of the ways to get data from HBase is to scan. Use the scan command to scan the table for data.You can limit your scan, but for now, all the data is fetched.

hbase(main):018:0> scan 'test'
ROW                   COLUMN+CELL
 row1                 column=cf:a, timestamp=1508341973711, value=Value1
 row2                 column=cf:b, timestamp=1508342070897, value=Value2
 row3                 column=cf:c, timestamp=1508342080128, value=Value3
3 row(s) in 0.0240 seconds

hbase(main):019:0>

7) Get a  Single Row of data
To get single row of data at a time, use the get command
hbase(main):020:0> get 'test','row1'
COLUMN                CELL
 cf:a                 timestamp=1508341973711, value=Value1
1 row(s) in 0.0150 seconds

hbase(main):021:0>

8) Disable
If you want to delete a table or change its settings, as well as in some other situations, you need
to disable the table first, using the disable command. You can re-enable it using the enable
command.

hbase(main):021:0> disable 'test'
0 row(s) in 2.3110 seconds
hbase(main):022:0>

hbase(main):023:0> enable 'test'
0 row(s) in 2.2830 seconds

9) Drop Table
hbase(main):028:0> drop 'test'
0 row(s) in 1.2720 seconds

hbase(main):029:0> list
TABLE
0 row(s) in 0.0080 seconds
=> []
hbase(main):030:0>

10) 









Tuesday, February 2, 2016

HDFS Issues

HDFS client failed to install due to bad symlink:
----------------------------------------------------------------
Error Message:
 File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 87, in action_create  raise Fail("Applying %s failed, parent directory %s doesn't exist" % (self.resource, dirname)) resource_management.core.exceptions.Fail: Applying File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] failed, parent directory /usr/hdp/current/hadoop-client/conf doesn't exist
Sol: might be you have multiple versions of the rpms in our local yum repo so ambari was confused and pulling older rpms, causing this error AND also check:
1) might BE your previous Ambari clean up didn't happen properly.
2) Is hdfs-client installed on that host?
3) As always, verify correct permissions exist on the directories.
    EX:
    lrwxrwxrwx 1 root root 30 Oct 13 00:24 hadoop-client -> /usr/hdp/2.4.0.0-2042/hadoop

Saturday, January 16, 2016

YARN TimelineServer


Overview: Storage and retrieval of applications's current as well as historic information in a generic fashion is solved in YARN through the Timeline server.

This server two responsibilities:
1) Generic information about completed applications.
     Ex: Application level data like queue name ,user information etc.
2) Pre-framework information about completed applications. Ex: Hadoop MapReduce framework can include pieces of information like number of map task and reduce task, counters etc.

Configuration: