Big Data/Hadoop: 2016

Friday, June 17, 2016

Apache Ranger

Tuesday, June 14, 2016

HBase Dev Part 1: HBase Development

1) Connect HBase Shell:
[danna@cloudglee01 ~]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.6.1.0-129, r718c773662346de98a8ce6fd3b5f64e279cb87d4, Wed May 31 03:27:31 UTC 2017
hbase(main):001:0>

2) Display HBase Shell Help Text:
Type help and press Enter, to display some basic usage information for HBase Shell, as well as
several example commands. Notice that table names, rows, columns all must be enclosed in
quote characters.
hbase (main):001:0> help

3) Create a table
Use the create command to create a new table. You must specify the table name and ColumnFamily name

hbase(main):008:0> create 'test','cf'
0 row(s) in 2.2870 seconds
=> Hbase::Table - test
hbase(main):009:0>

4) List information about your table
Use the List Command

hbase(main):009:0> list
TABLE
test
1 row(s) in 0.0130 seconds
=> ["test"]
hbase(main):010:0>

5) Put data into your table
To put data into your table use put command

hbase(main):010:0> put 'test','row1','cf:a','Value1'
0 row(s) in 0.0920 seconds
hbase(main):011:0> put 'test','row2','cf:b','Value2'
0 row(s) in 0.0180 seconds
hbase(main):012:0> put 'test','row3','cf:c','Value3'
0 row(s) in 0.0090 seconds

hbase(main):011:0>

Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value
of value1. Columns in HBase are comprised of a column family prefix, cf in this example,
followed by a colon and then a column qualifier suffix, a in this case.

6) Scan the for all the data at once
One of the ways to get data from HBase is to scan. Use the scan command to scan the table for data.You can limit your scan, but for now, all the data is fetched.

hbase(main):018:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1508341973711, value=Value1
row2 column=cf:b, timestamp=1508342070897, value=Value2
row3 column=cf:c, timestamp=1508342080128, value=Value3
3 row(s) in 0.0240 seconds

hbase(main):019:0>

7) Get a Single Row of data

To get single row of data at a time, use the get command

hbase(main):020:0> get 'test','row1'
COLUMN CELL
cf:a timestamp=1508341973711, value=Value1
1 row(s) in 0.0150 seconds

hbase(main):021:0>

8) Disable
If you want to delete a table or change its settings, as well as in some other situations, you need
to disable the table first, using the disable command. You can re-enable it using the enable
command.

hbase(main):021:0> disable 'test'
0 row(s) in 2.3110 seconds
hbase(main):022:0>

hbase(main):023:0> enable 'test'
0 row(s) in 2.2830 seconds

9) Drop Table

hbase(main):028:0> drop 'test'

0 row(s) in 1.2720 seconds

hbase(main):029:0> list

TABLE

0 row(s) in 0.0080 seconds

=> []

hbase(main):030:0>

10)

Tuesday, February 2, 2016

HDFS Issues

HDFS client failed to install due to bad symlink:
----------------------------------------------------------------
Error Message:

File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 87, in action_create raise Fail("Applying %s failed, parent directory %s doesn't exist" % (self.resource, dirname)) resource_management.core.exceptions.Fail: Applying File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] failed, parent directory /usr/hdp/current/hadoop-client/conf doesn't exist

Sol: might be you have multiple versions of the rpms in our local yum repo so ambari was confused and pulling older rpms, causing this error AND also check:
1) might BE your previous Ambari clean up didn't happen properly.
2) Is hdfs-client installed on that host?
3) As always, verify correct permissions exist on the directories.
EX:
lrwxrwxrwx 1 root root 30 Oct 13 00:24 hadoop-client -> /usr/hdp/2.4.0.0-2042/hadoop

Saturday, January 16, 2016

YARN TimelineServer

Overview: Storage and retrieval of applications's current as well as historic information in a generic fashion is solved in YARN through the Timeline server.

This server two responsibilities:
1) Generic information about completed applications.
Ex: Application level data like queue name ,user information etc.
2) Pre-framework information about completed applications. Ex: Hadoop MapReduce framework can include pieces of information like number of map task and reduce task, counters etc.

Configuration:

Big Data/Hadoop

Friday, June 17, 2016

Apache Ranger

Tuesday, June 14, 2016

HBase Dev Part 1: HBase Development

Tuesday, February 2, 2016

HDFS Issues

Saturday, January 16, 2016

YARN TimelineServer

Search This Blog

Blog Archive

Total Pageviews

Translate