Wednesday, February 26, 2014

Cluster Pre and Post Validation Scripts

Cluster Validation
Before installing MapR Hadoop it is invaluable to validate the hardware and software that MapR will be dependent on. Doing so will verify that items like disks and DIMMs (Dual In-line Memory Module) are performing as expected and with a known benchmark metric. Doing so will also verify that many of the basic OS configurations and packages are in the required state and that state is also recorded in the output log.
Please use the steps below to test CPU/RAM, disk, and networking performance as well as to verify that your cluster meets MapR installation requirements. Pre-install tests should be run before installing MapR. Post-install tests should be run after installing the MapR software and configuring a cluster. Post-install tests help assure that the cluster is in good working order and ready to hand over to your production team.
Install clush (rpm provided, also available via EPEL) on a machine with passwordless ssh to all other cluster nodes. If using a non-root account, then non-root account should have passwordless sudo rights configured in /etc/sudoers. Update the file /etc/clustershell/groups to include an entry for "all" matching a pattern or patterns of host names. For example; "all: node[0-10]" Verify clush works correctly by running "clush -a date". Compare results with "clush -ab date".
Download and extract the cluster-validation package with a command like this: curl -LO 'https://github.com/jbenninghoff/cluster-validation/archive/master.zip' Extract with unzip master.zip and move the pre-install and post-install folder directly under /root for simplicity.
Copy the /root/pre-install folder to all nodes in the cluster. The clust commmand, properly configured, simplifies this : clush -a --copy /root/pre-install clush -Ba ls /root/pre-install # confirm that all nodes have the utilties
Gather Base Audit Information
Use cluster-audit.sh to verify that you have met the MapR installation requirements. Run: /root/pre-install/cluster-audit.sh | tee cluster-audit.log on the node where clush has been installed and configured to access all cluster nodes. Examine the log for inconsistency among any nodes.
 Do not proceed until all inconsistencies have been resolved and all requirements such as missing rpms, java version, etc have been met. Please send the output of the cluster-audit.log back to us.

NOTE: cluster-audit.sh is designed for physical servers.Virtual Instances in cloud environments (eg Amazon, Google, or OpenStack) may generate confusing responses to some specific commands (eg dmidecode).  In most cases, these anomolies are irrelevant.

Evaluate Network Interconnect Bandwidth
Use the RPC test to validate network bandwidth. This will take about two minutes or so to run and produce output so please be patient. Update the half1 and half2 arrays in the network-test.sh script to include the first and second half of the IP addresses of your cluster nodes. Delete the exit command also. Run: /root/pre-install/network-test.sh | tee network-test.log on the node where clush has been installed and configured. Expect about 90% of peak bandwidth for either 1GbE or 10GbE networks: 1 GbE ==> ~115 MB/sec 10 GbE ==> ~1100 MB/sec
Evaluate Raw Memory Performance
Use the stream59 utility to test memory performance. This test will take about a minute or so to run. It can be executed in parallel on all the cluster nodes with the command : clush -Ba '/root/pre-install/memory-test.sh | grep ^Triad' | tee memory-test.log Memory bandwidth is determined by speed of DIMMs, number of memory channels and to a lesser degree by CPU frequency. Current generation Xeon based servers with eight or more 1600MHz DIMMs can deliver 70-80GB/sec Triad results. Previous generation Xeon cpus (Westmere) can deliver ~40GB/sec Triad results.
Evaluate Raw Disk Performance
Use the iozone utility to test disk performance. This process is destructive to disks that are tested, so make sure that sure that you have not installed MapR nor have any needed data on those spindles. The script as shipped will ONLY list out the disks to be tested. You MUST edit the script once you have verified that the list of spindles to test is correct.
The test can be run in parallel on all nodes with: clush -ab /root/pre-install/disk-test.sh
Current generation (2012+) 7200 rpm SATA drives can produce 100-145 MB/sec sequential read and write performance. For larger numbers of disks there is a summIOzone.sh script that can help provide a summary of disk-test.sh output.
Complete Pre-Installation Checks
When all subsystem tests have passed and met expectations, there is an example install script in the pre-install folder that can be modified and used for a scripted install. Otherwise, follow the instructions from the doc.mapr.com web site for cluster installation.
Post install tests are in the post-install folder. The primary tests are RWSpeedTest and TeraSort. Scripts to run each are provided in the folder. Read the scripts for additional info.
A script to create a benchmarks volume (mkBMvol.sh) is provided. Additionally, runTeraGen.sh is provided to to generate the terabyte of data necessary for the TeraSort benchmark. Be sure to create the benchmarks volume before running any of the post install benchmarks. NOTE: The TeraSort benchmark (executed by runTeraSort.sh) will likely require tuning for each specific cluster. Experiment with the -D options as needed.
There is also a mapr-audit.sh script which can be run to provide an audit snapshot of the MapR configuration. The script is a useful set of example maprcli commands. There are also example install, upgrade and uninstall scripts. None of those will run without editing, so read the scripts carefully to understand how to edit them with site specific info.
Pre-Installation Scripts:

-rw-rw-r-- 1 6631 Jul  9 16:15 cluster-audit.sh
-rw-rw-r-- 1 1934 Jul  9 16:15 disk-test.sh
-rw-rw-r-- 1 4504 Jul  9 16:15 example-mapr-install.sh
-rw-rw-r-- 1250400 Jul  9 16:15 clustershell-1.6-1.el6.noarch.rpm
-rw-rw-r-- 1 365509 Jul  9 16:15 iozone
-rw-rw-r-- 1 83058 Jul  9 16:15 iperf
-rw-rw-r-- 1 647 Jul  9 16:15 java-post-install.sh
-rw-rw-r-- 1 2210 Jul  9 16:15 lsi-config.sh
-rw-rw-r-- 1 1115 Jul  9 16:15 memory-test.sh
-rw-rw-r-- 1 216 Jul  9 16:15 network-test.sh
-rw-rw-r-- 1 4742 Jul  9 16:15 README.txt
-rw-rw-r-- 1 30808 Jul  9 16:15 rpctest
-rw-rw-r-- 1 2566 Jul  9 16:15 summIOzone.sh

-rw-rw-r-- 1 853593 Jul  9 16:15 stream59


Clush Installation/Importance

Clush is a program for executing commands in parallel on a cluster and for gathering their results.Clush executes commands interactively or can be used within shell scripts and other applications.

Links to download

https://github.com/cea-hpc/clustershell/downloads
or
https://github.com/jbenninghoff/cluster-validation

Look for below link and click and unzip the master.zip and look for Clustershell rpm and install it.

https://github.com/jbenninghoff/cluster-validation/archive/master.zip

Go to /etc/clustershell/groups and update the files accordingly

Ex:
[root@ebdp-wc-d01d mapr_dump]# cat /etc/clustershell/groups
all: ebdp-wc-d0[1-5]d
zk: ebdp-wc-d01d ebdp-wc-d02d ebdp-wc-d03d
cldb: ebdp-wc-d01d
jobt: ebdp-wc-d02d
ws: ebdp-wc-d03d
mysql: ebdp-wc-d04d
hive: ebdp-wc-d05d
[root@ebdp-wc-d01d mapr_dump]# 

Important Commands:
     To copy to all the nodes
clush -a -c /home/clduser/mprrepo_bkp/jdk-7u51-linux-x64.rpm

To remove the files from the nodes
clush -g all rm -rf /home/clduser/mprrepo_bkp/jdk-7u51-linux-x64.rpm

If you want to install CLDB
clush -g cldb <mapr cldb rpm>
Use below command if you have any manual intervention to press YES
clush -g all yum -y install ganglia-gmond

To copy from one to another directory
clush -g all -c /etc/ganglia/gmond.conf --dest /etc/ganglia/

Uninstall all the rpm files from all the nodes
clush -g all yum -y remove mapr-core-3.1.0.23703.GA-1.x86_64

Command To erase the java
Clush –g all yum –y erase java

Command To delete the files
Clush –g all rm /test/test.txt

To Remove JAVA
Clush –g all -yum -y erase jdk

clush -g all yum -y install /home/clduser/jdk-7u51-linux-x64.rpm

To Check Clush

clush -a data or clush -ab date

To append teradatahost names to our /etc/hosts file
clush -g all cat '/etc/terahosts >> /etc/hosts' ( Single quotes are important here)

Where the Extract Files wants to place:
clush -g all tar -xf /usr/lib64/python2.6/numpy-1.9.1.tar.gz -C /usr/lib64/python2.6/






Monday, February 24, 2014

Apache Sqoop -Part 1: Basic Concepts


     Apache Sqoop is a tool designed for efficiently transferring bulk data in a distributed manner between relational databases RDBS such as MySQL and Oracle to HDFS and vice versa.

    Sqoop import the data from RDBMS to Hadoop Distributed System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

    Sqoop Automates most of this process,Sqoop uses MapReduce to import and export the data,which provides parallel operation and fault tolerance.

    Sqoop successfully graduated from incubator in March of 2012 and is now a Top Level Apache    project.

The following diagrams are from Apache documentation:

Import/Export Process:





Sqoop 2

Sqoop2 transfers bulk data between Hadoop and various types of structured datastores, 
such as relational databases, enterprise data warehouses, and NoSQL systems. Sqoop2 can be used 
in conjunction with Hue, which is a web-based GUI that facilitates the importing and exporting 
of data from these structured datastores.

Currently, Sqoop2 does not support all the features of Sqoop1. Refer to the following 
table to understand, the differences and determine which version is best for your purposes.

Sqoop2 Architecture:



Differences between Sqoop1 and Sqoop2
Feature

1) Specialized connectors for all major RDBMS
    Sqoop1: Available.
    Sqoop2: Not available.
                  
    However, you can use the generic JDBC connector, which has been tested on these databases
    MySQL
    Microsoft SQL Server
    Oracle
    PostgreSQL

The generic JDBC connector should also work with any other JDBC-compliant database, 
although specialized connectors probably give better performance.

2) Data transfer from RDBMS to Hive or HBase
     Sqoop1: Done automatically.
     Sqoop2: Must be done manually in two stages:
Import data from RDBMS into MapR-FS.
       Load data into Hive (using the LOAD DATA command) or HBase

3) Data transfer from Hive or HBase to RDBMS
    Sqoop1: Must be done manually in two stages:
                   Extract data from Hive or HBase into MapR-FS, as a text file or as an Avro file.
                   Export the output of step 1 to an RDBMS using Sqoop.
     Sqoop2: Must be done manually in two stages:
                   Extract data from Hive or HBase into MapR-FS, as a text file or as an Avro file.
                   Export the output of step 1 to an RDBMS using Sqoop.

4) Integrated Kerberos security
    Sqoop1: Supported.
    Sqoop2: Not supported.

5) Password encryption
    Sqoop1: Not supported.
    Sqoop2: Supported using Derby's data encryption feature
                   (although the configuration has not been verified).

Sqoop1 & Sqoop2 Supported databases:  

database              Connection String
mysql                    jdbc:mysql://
Oracle                   jdbc:oracle://
Teradata               jdbc:teradata://

Topics Covered in this Article:This blog is mostly notes for my self study and took an 
information from many links and a big data enthusiast blogs.

a) Versions Covered (MySQL,Oracle & Teradata)
b) Sqoop Installation
c) Download and save ODBC drivers for MySQL,Oracle & Teradata.
d) Sqoop list commands.
e) Import/Export data from RDBMS to HDFS and Vice Versa.


     Versions Covered:
      Sqoop      1.4.2
      MySQL   5.0+
      Oracle     10.2.0+
      Teradata 

Sunday, February 23, 2014

Important Hadoop Articles

Apache Spark, an in-memory data-processing framework, is now a top-level Apache project. That’s an important step for Spark’s stability as it increasingly replaces MapReduce in next-generation big data applications.

MapReduce was fun and pretty useful while it lasted, but it looks like Spark is set to take the reins as the primary processing framework for new Hadoop workloads. The technology took a meaningful, if not huge, step toward that end on Thursday when the Apache Software Foundation announced that Spark is now a top-level project.
Spark has already garnered a large and vocal community of users and contributors because it’s faster than MapReduce (in memory and on disk) and easier to program. This means it’s well suited for next-generation big data applications that might require lower-latency queries, real-time processing or iterative computations on the same data (i.e., machine learning). Spark’s creator from the University of California, Berkeley , have created a company called Databricks to commercialize the technology.
Spark is technically a standalone project, but it was always designed to work with the Hadoop Distributed File System.It can run directly on HDFS, inside MapReduce and, thanks to YARN, it can now run alongside MapReduce jobs on the same cluster. In fact, Hadoop pioneer Cloudera is now providing enterprise support for customers that want to use Spark.




However, MapReduce isn’t yesterday’s news quite yet. Although many new workloads and projects (such as Hortonworks' Stinger) use alternative processing frameworks, there’s still a lot of tooling for MapReduce that Spark doesn’t have yet (e.g., Pig and Cascading), and MapReduce is still quite good for certain batch jobs. Plus, as Cloudera co-founder and Chief Strategy Officer Mike Olson explained in a recent Structure Show podcast (embedded below), there are a lot of legacy MapReduce workloads that aren’t going anywhere anytime soon even as Spark takes off.
If you want to hear more about Spark and its role in the future of Hadoop, come to our Structure Data conference March 19-20 in New York. Databricks co-founder and CEO Ion Stoica will be speaking as part of our Structure Data Awards presentation, and we’ll have the CEOs of Cloudera, Hortonworks, and Pivotal talking about the future of big data platforms and how they plan to capitalize on them.

Cloudera launches in-memory analyzer for Hadoop


Hadoop distributor Cloudera has released a commercial edition of the Apache Spark program, which analyzes data in real time from within Cloudera’s Hadoop environments.

The release has the potential to expand Hadoop’s use for stream processing and faster machine learning.

“Data scientists love Spark,” said Matt Brandwein, Cloudera director of product marketing.

Spark does a good job at machine learning, which requires multiple iterations over the same data set, Brandwein said.

“Historically, you’d do that stuff with MapReduce, if you’re using Hadoop. But MapReduce is really slow,” Brandwein said, referring to how the MapReduce framework requires many multiple reads and writes to disk to carry out machine learning duties. Spark can do this task while the data is still in working memory. Maintainers of the software claim that Spark can run programs up to 100 times faster than Hadoop itself, thanks to its in-memory design model.

Spark is also good at stream processing, in which it can monitor a constant flow of data and carry out certain functions if certain conditions are met.

Stream processing, for instance, could be applied to fraud management and security event management. “In those cases, you’re analyzing real-time data off the wire to detect any anomalies and take action,” Brandwein said. The data can also be off-loaded to the Hadoop file system for further interactive and deeper batch-processing analysis.

First developed at University of California at Berkeley, Apache Spark provides a way to load streaming data into the working memory of a cluster of servers, where it can be queried in real-time. It has no upper limit of how many servers, or how much memory, it can use.

It relies on the latest version of Hadoop data-processing network, which uses YaRN (Yet another Research Negotiator). Spark does not require the MapReduce framework though, which operates in batch mode. It has APIs (application programming Interfaces) for Java, Scala and Python. It can natively read data from the HDFS (Hadoop File System), the HBase Hadoop database and the Cassandra data store.

The Apache Spark project has over 120 developers who have contributed to the project, and the technology has been used by Yahoo, Intel, as well as a number of other, smaller, companies. DataBricks, which offers its own commercial version Spark, offers support for Spark on behalf of Cloudera users.

The idea of applying Hadoop-style analysis to streaming data is not a new one. Twitter maintains Storm, a set of open source software it uses for analyzing messages.

In addition to Spark, Cloudera also announced that it has repackaged its commercial Hadoop offering into three separate packages: the Basic edition, the Flex edition and the Enterprise Hub Edition. The Enterprise Hub addition bundles all of the additional tools that Cloudera has integrated with Hadoop, including HBase, Spark, backup capabilities, and the Impala SQL analytic edition. The Flex edition allows the user to pick one additional tool in addition to core Hadoop.

Cloudera has also renamed its Cloudera Standard edition to Cloudera Express.

-----------------------------------------------------------------------------------------------------------


Sunday, February 16, 2014

Hive ,HiveServer2 and thrift service Issues

Hive General Issues:
------------------------
1) How to disable hive shell for all users (Hive CLI)?
we can remove the permission from /usr/hdp/current/hive-client/bin/hive but 
we may run into issues with production daily jobs from either Oozie or any another 
third party tools.Try this while adding in hive-env template via ambari to disable hive-shell

if [ "$SERVICE" = "cli" ]; then
echo "Sorry! I have disabled hive-shell"
exit 1 
fi

then re-start hiveservice
---------------------------------------------







Saturday, February 15, 2014

Sample Resume for Hadoop/BigData

Name : Hadoop/BigData
Phone: xxx-xxx-xxxx
                                                                                             


                                                                                                                                                              
Professional Summary :                                                             

Ø Over 3+ years of experience in IT and Expertise in Linux , SQL&PL/SQL ,Hadoop, HDFS, HBASE, Hive, pig and OBIEE with hands-on project experience in various Vertical Applications which includes Telecom, Financial Services, and eSales.
Ø Extensive Experience on Linux, Shell Scripting and SQL.
Ø Good Experience SQL performance Tuning.
Ø Expertise in HDFS Architecture and Cluster concepts.
Ø Extremely motivated with good inter-personal Skills; have ability to work in strict deadlines.
Ø Expertise in Hadoop Security and Hive Security.
Ø Expertise in Hive Query Language and debugging hive issues.
Ø Expertise in sqoop and Flume.
Ø Extended Table columns for custom fields as per business requirements.
Ø Expertise in OBIEE and DWH Concepts.
Ø Experience with Oracle & MS SQL Server RDBMS.
Ø Expertise in UNIX and PL/SQL.

Education Qualifications:

·         XXX from XYZ University.US

Technical skills:-

Operating Systems

Windows NT/XP/2003 & UNIX,LINUX

Languages

 C,C++, JavaScript, VBScript, HTML,XMLs

Databases

Oracle , SQL Server 7/2000

GUI

Visual Basic 6.0

Tools & Utilities

QC

Defect Tracking Tool

HP Quality Center 9.2,Lotus Notes, AR Remedy
Microsoft Office Tools:
Power Point, Word, Excel, Visio


Project Experience:
Client: XYZ ,NY                                                                                            Jan’09  to Aug’11
Role: Software Developer
Environments: Linux, SQL, Oracle & MY SQL
Responsibilities:

·         Involved in Development and monitoring Application.
  • Good Experience in Estimating work effort.
  • Expertise with Developing SQL scripts and Performance Tuning.
·         Expertie in Analyzing data Quality checks using shell scripts.
·         Expertise in Loading data into Data Base using Linux OS
·         Developing MapReduce Program to format the data.
·         Expertise in handling with Large Data Warehouses for pulling reports.
·         Expertise in preparing the HLDs and LDS and preparing Unit Test Cases based on functionality.
·         Expertise in Linux OS helth Checks and debugging the issues.
·         Expertise in installing prm packages on Linux.
·         Expertise in Security Layer in OS permission level and DB table level.
·         Expertise in Alert System Nagios.

Client: XYZ,CA                                                                                       Feb’08 to Dec’08
Role: Software Developer
Environment: Linux,SQL.
Responsibilities
·         Involved in support and monitoring production Linex Systems.
·         Installation SQL and DB Backp.
·         Expergtise in Archive logs and Monitoring the jobs.
·         Monitoring Linex daily jobs and monitoring log management system.
·         Expertise in  troubleshooting and able to work with a team to fix large production issues
·         Expertise in creating and managing DB tables, Index and Views.
·         User creationa and managing user accounts and permissions on Linux level and DB level.

·         Expertise in Security in OS level and DB level.

Linux Commands for Hadoop/Big Data Admin


ls Command:
ls command is one of the most frequently used command in Linux.I believe ls command is the first command you may use when you get into the command prompt of Linux Box. We use ls command daily basis and frequently even through we may not aware and never use   all the option available.In this article,we'll be discussing basis ls command where we have         tried to cover as much parameters as possible

a) List Files using ls with no option
     [divakar@divakar BD]# ls
     a.txt  b.txt
     [divakar@divakar BD]#

b) List Files With option –l
    [divakar@localhost ~]$ ls -l
    -rw-rw-r--. 1 divakar divakar    0 Feb 11 18:51 divakar.txt
    -rwxr-xr-x. 1 divakar divakar    0 Feb 12 18:22 diva.txt
    [divakar@localhost ~]$ 

c) View Hidden Files
    List all files including hidden file starting with ‘.‘.
    [divakar@divakar ~]$ ls -al
    drwx------. 27 divakar divakar 4096 Feb 21 14:22 .
    drwxr-xr-x.  3 root    root    4096 Feb 10 13:52 ..
    -rw-rw-r--.  1 divakar divakar    0 Feb 11 18:51 divakar.txt
    -rwxr-xr-x.  1 divakar divakar    0 Feb 12 18:22 diva.txt
    [divakar@localhost ~]$

d) List Files with Human Readable Format with option -lh
    With combination of -lh option, shows sizes in human readable format.
    [divakar@localhost ~]$ ls -lh
    -rw-rw-r--. 1 divakar divakar    0 Feb 12 18:25 d1.txt
    -rw-rw-r--. 1 divakar divakar    0 Feb 12 18:25 d2.txt
    [divakar@localhost ~]$ 

e) List Files and Directories with ‘/’ Character at the end
    Using -F option with ls command, will add the ‘/’ Character at the end each directory.
    [divakar@localhost ~]$ ls -F
    d1.txt  Desktop/     diva.txt*   Downloads/  Pictures/  Templates/  xyz/
    d2.txt  divakar.txt  Documents/  Music/      Public/    Videos/
    [divakar@localhost ~]$ 

f) List Files in Reverse Order
   The following command with ls -r option display files and directories in reverse order.
   [divakar@localhost ~]$ ls -r
   xyz Templates  Pictures  Downloads  diva.txt Desktop  d1.txt Videos  Public divakar.txt  d2.txt
   [divakar@localhost ~]$ 

g) Recursively list Sub-Directories

    ls -R option will list very long listing directory trees. See an example of output of the command.

    [divakar@localhost ~]$ ls -R
    d1.txt  Desktop  diva.txt   Downloads  Pictures  Templates  xyz
    
h) Reverse Output Order

    With combination of -ltr will shows latest modification file or directory date as last.

    [divakar@localhost ~]$ ls -ltr
    drwxr-xr-x. 2 divakar divakar 4096 Feb 10 18:55 Templates
    drwxr-xr-x. 2 divakar divakar 4096 Feb 10 18:55 Downloads
    drwxr-xr-x. 2 divakar divakar 4096 Feb 10 18:55 Videos
    drwxr-xr-x. 2 divakar divakar 4096 Feb 10 18:55 Public

i) Sort Files by File Size

   With combination of -lS displays file size in order, will display big in size first.

   [divakar@localhost ~]$ ls -ls
   0 -rw-rw-r--. 1 divakar divakar    0 Feb 12 18:25 d1.txt
   0 -rw-rw-r--. 1 divakar divakar    0 Feb 12 18:25 d2.txt
   4 drwxr-xr-x. 4 divakar divakar 4096 Feb 11 18:03 Desktop
   0 -rw-rw-r--. 1 divakar divakar    0 Feb 11 18:51 divakar.

j) Display Inode number of File or Directory
   We can see some number printed before file / directory name. With -i options list file /directory with        inode number
   [divakar@localhost ~]$ ls -i
  272870 d1.txt       272869 diva.txt   397175 Pictures   397282 xyz
  272871 d2.txt       397173 Documents  397172 Public

k) Shows version of ls command
  Check version of ls command.
  [divakar@localhost ~]$ ls --version

  ls (GNU coreutils) 8.4

  Copyright (C) 2010 Free Software Foundation, Inc

l) Show Help Page
  List help page of ls command with their option.
  [divakar@localhost ~]$ ls --help
  Usage: ls [OPTION]... [FILE]...
  List information about the FILEs (the current directory by default).
  Sort entries alphabetically if none of -cftuvSUX nor --sort.

m) List Directory Information
  With ls -l command list files under directory /tmp. Wherein with -ld parameters displays information       of /tmp directory.
  [divakar@localhost ~]$ ls -l /
  dr-xr-xr-x.   2 root root  4096 Feb 10 19:34 bin
  dr-xr-xr-x.   5 root root  1024 Feb 10 13:52 boot

n) Display UID and GID of Files
  To display UID and GID of files and directories. use option -n with ls command.
  [divakar@localhost ~]$ ls -n
  -rw-rw-r--. 1 500 500    0 Feb 12 18:25 d1.txt
  -rw-rw-r--. 1 500 500    0 Feb 12 18:25 d2.txt

Copying with the cp Command
a) How do I copy files?
     cp filename1 filename2 -> copies a file.
b) How do I copy recursively?
    cp –r dir1 dir2
c) To see copy progress pass –v option to cp command.
    cp  –v  –r  dir1 dir2
d) How do I confirm file overwriting?
    cp –i dir1 dir2
e) Preserve the file permission and other attributes.
    cp –p file1 file2

Deleting Files with the 'rm' Command
The rm command deletes the files.This command has several options,but should be used cautiously.
a) The rm command will delete one or several files from the command line.
     rm file1
     rm file1 file2 file3
b) One of the safer ways to use rm is through the -i or interactive option, where you'll be asked if you want to delete the file.
    rm -i file1
c) you can also force file deletion by using -f option 
    rm -f file1
d) when we combine -f and -r, the recursive option , you can delete directories and all files or directories found.
   rm -rf <directoryname>

Creating Directories with the 'mkdir' Command
The mkdir command can create one or several directories with a single command line.
a) Creating directories
    mkdir <directoryName>
b) Creating multiple directories
   mkdir <directoryName1> <directoryname2>
c) Creating Child under directories
   mkdir temp/child
d) To build a hierarchy of directories with mkdir, you must use the -p,or parent option, for example
    mkdir -p temp5/parent/child

Removing Directories with the 'rmdir' Command

a) The rmdir command is used to remove directories. To remove a directory, all you have to do is type
    rmdir <DirectoryName>
b) Removing directories and sub directories as well
    rm -rf

Renaming Files with the 'mv' command.
The mv command, called a rename command but know to many as a move command
a) Move the data file1 to file2 
    mv <file1> <file2>
b) The mv command can work silently, or as with rm, you can use the -i (interactive) option 
    mv -i <file1> <file2>

Creating Hard and Symbolic Links with the 'ln' Command
The ln command creates both types of links.If you use the ln command to create a hard link, you specify a second file on the command line you can use to reference the original file,for Ex
# ln file1 file2
#ln -s file1 file 2

ps command: The ps command will show information about current system process.
Ps-> The user’s currently running processes.
Ps –f -> Full listing of the user’s currently running process.
ps –ef -> Full listing of all process, except kernel process.
Ps –A -> All process, including kernel process.
Ps auxw -> wide listing sorted by percentage of CPU usage.

Last: The last command shows the history of who has logged in to the system since the wtmp file was created.
Who: The who command gives this output for each logged-in user: username,tty .login time and where the user logged in from.
W: The W command is really an extended who

Checking your installation Files
rpm –qa | grep ^x
you should receive the output similar to the following..
xorg-x11-drv-apm-1.2.2-1.1.el6.x86_64
xorg-x11-drv-penmount-1.4.0-5.el6.x86_64
xorg-x11-drv-ast-0.89.9-1.1.el6.x86_64
xorg-x11-drv-aiptek-1.3.0-2.el6.x86_64
Installing the X Files
rpm –ivh <filename>

Moving to different directories with the cd command.
cd /usr/bin
cd ..
cd ../..
cd or cd –

Knowing where you are with the pwd command.
pwd
pwd –help
/bin/pwd   --help

Searching directories for matching files with the find command
Syntax: find where-to-look criteria what-to-do
Find /usr –name spell –print

Listing and Combining Files with the cat Command
The cat (Concatenate file) commands are used to send the contents of files to your screen.

a) cat a.txt
[abc@master]$ cat a.txt
Hello Hello Hello
Hello hello hello
[abc@master]$

b) The cat command also has a number of options. If you’d like to see your file with line numbers, you can use the n-option
#cat –n  a.txt
1  Hello Hello Hello
2  Hello hello hello

c) You can also use cat to look at several files at once.
# cat –n  test*
1  Hello Hello Hello
2  Hello hello hello
[root@localhost ~]# cat a.txt
aksjlkdj
[root@localhost ~]#
[root@localhost ~]# cat b.txt
alksjdjkjaskjkaj
;lajslkdjja
lkashdjfajs
[root@localhost ~]#

d) As you can see, cat has also include a second file in its output.
[root@localhost ~]# cat a.txt b.txt
aksjlkdj
alksjdjkjaskjkaj
;lajslkdjja
lkashdjfajs
[root@localhost ~]#
cat > c.txt
e) To see the numbers using cat
   # cat -n divakar.txt
100     divakar 20000
200     diiia   10000
399     ksjjkj  30999[

Reading the files with the 'more' command
[root@localhost ~]# more c.txt
kajslkj
lkasjdkfjjs
klakjshkdjfhkahs
[root@localhost ~]#

Browsing Files with the 'less' command.
less c.txt
Reading the Beginning or End of Files with the head and tail Commands.
head -5 /usr/man/man.txt
head -5 –q /usr/man/man.txt
tail -12 /var/log/message/a.txt
The more command is one of a family of Linux commands called pagers.

Creating Files with the 'touch' Command
The touch command is easy to use, and generally, there are two reasons to use it .The first reason is to create a file, and the second is to update a file’s modification date.

a) To create a file with touch,use
# touch newfile
#ls –l newfile
-rw-r--r-- 1 divakar divakar 0 Feb 21 14:12 newfile

b) To Change time stamp
# touch –t 1225110099 newfile2

Trap: when the program is terminated before it would normal end, we can catch an exit signal.
0-      Normal termination, end of script.
1-SIGHUP -> hang up, line disconnected
2-singint-> terminal interrupts, usually ctrl+c
3- SIGQUIT -> Quit Key, Child process to die before terminating.
9-SIGKILL->kill -9 commands can’t trap this type of exit status.
15-SIGTERM  àkill command’s default action.
19-SIGSTOP->stop, usually ctrl+z
17 -> dintfunc
Ex: Kill –9 <ps Id>

df -> Report how much free disk space is available for each mount you have.
df –a  --all -> include dummy file system.
df –B  100, --block-size=SIZE -> use SIZE –byte blocks
df –h ->human readable àprint sizes in human readable format.
df –I -> list inode information instead of block usage.
Df –k -> like  --block –Size =1 k.
df –T ->Print file System Type.

du -> disk usage
du ->tells you how much disk space a file occupies.
du –a ->display  the space that each file is taking up.
Du –h -> which can make the output easier to read by displaying it in KB /M/G.
Du –sh -> The -s (for suppress or summarize) option tells du to report only the total disk space occupied by a directory tree and to suppress individual reports for its subdirectories

top àdisplays top CPU process.
The top program provides a dynamic real-time view of a running system.
Uname

Free -> displays information about free and used memory on the system.
-b,-k,-m,-g show output in bytes, KB, MB, or GB
-l show detailed low and high memory statistics
-o use old format (no -/+buffers/cache line)
-t display total for RAM + swap
-s update every [delay] seconds
-c update [count] times
-V display version information and exit

awk-> The awk command is powerful method for processing or analyzing text files,in particulat data files that are organized by lines (rows) and cloumns.

How to add user:
Useradd div  --to add user Id
Passwd div –to set password

Set Account disable date:
useradd -e {yyyy-mm-dd} {username}
useradd -e 2008-12-31 jerry


Set default password expiry:
useradd -f {days} {username}
useradd -e 2009-12-31 -f 30 jerry


How Can I see the entire users list on Linux Server?
cat /etc/passwd
Vim /etc/passwd
cat /etc/passwd | grep home | cut -d':' -f1

How to convert you as root, create new user and setting password.
[divakar@localhost Desktop]$ su root
Password: <give your user passwd>
[root@localhost Desktop]# useradd diva1
[root@localhost Desktop]# passwd diva1
Changing password for user diva1.
New password:
BAD PASSWORD: it is too short
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
[root@localhost Desktop]#

How to add user in sudo list.
Go to vi /etc/sudoers
## Allow root to run any commands anywhere
root    ALL=(ALL)       ALL
divakar   ALL=(ALL)   ALL
Man ->

Find files with the whereis command.
Whereis find
You can also use whereis to find only the binary version of the program with
Whereis –b find
If whereis cann’t find your request , you’ll get an empty return string, for example
Whreis foo
It will search in entire system.
Not limiting searches to known directories such as /usr/man, /usr/bin, or /usr/sbin can speed up the task of finding files.
Although whereis is faster than using find to locate programs or manual pages.
Locate is faster than whereis

Locating files with locate command.
Finding a file using locate is much faster than the find command because locate will go directly to the database file,find any matching filenames, and print its results.
Locate *.ps
Locate resides in /var/lib

Moving different directories with cd command.
cd or
cd ../.. or 
cd or cd –

Knowing where you are wit pwd command.
Go to /user/local and type pwd to know your directory.
ps -ef | grep aneel
pkill -f

Link while copying the data?
cp –i test 1 test2

Getting command Summaries with whatis and apropos
Specifying other directories with ls
# ls /usr/bin

Listing Directories with the dir and vdir commands
#dir :-> this command works like the default ls command,listing the files in sorted columns
Vdir -> The vdir command works like the ls –l option, and presents a long format listing by default,
Graphic Directory listings with the tree command
# tree /var/lib
# tree –d /usr/local/netscape/

Search Inside Files with the 'grep' Command
grep command will help to search any words in file
Ex: cat hive.log | grep loaded [Here we are searching loaded files in log]

Compressing files with the 'gzip' command
# gzip file.tar

Compressing Files with the 'compress' Command
#compress file

To uncompress a file,use
# uncompress file.Z

Running Programs in the Background
# nohup ./divakar.sh &

Checking the Connection
Using the 'ipconfig' Command, This Command will help to identify the IP address of your system
# ifconfig
 [divakar@localhost ~]$ ifconfig
 eth0      Link encap:Ethernet  HWaddr 00:0C:29:7F:24:89
 inet addr:192.168.64.130  Bcast:192.168.64.255  Mask:255.255.255.0
 inet6 addr: fe80::20c:29ff:fe7f:2489/64 Scope:Link

Using the 'netstat' Command
The netstat command is the definitive command for checking your network activity,connections,routing tables,and other network messages and statistics.
# netstat

Using the ping Command
# ping <hostname>.com

Find Hostname of the System
#hostname

How to replace One word with another word in vi:
:%s/old-text/new-text/g

Hadoop File system Commands
The FileSystem (FS) shell is invoked by bin/hadoop fs <args>. All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. Differences are described with each of the commands. Error information is sent to stderr and the output is sent to stdout.

Ex : hadoop fs -ls /R
           [-ls <path>]
           [-lsr <path>]
           [-df [<path>]]
           [-du <path>]
           [-dus <path>]
           [-count[-q] <path>]
           [-mv <src> <dst>]
           [-cp <src> <dst>]
           [-rm [-skipTrash] <path>]
           [-rmr [-skipTrash] <path>]
           [-expunge]
           [-put <localsrc> ... <dst>]
           [-copyFromLocal <localsrc> ... <dst>]
           [-moveFromLocal <localsrc> ... <dst>]
           [-get [-ignoreCrc] [-crc] <src> <localdst>]
           [-getmerge <src> <localdst> [addnl]]
           [-cat <src>]
           [-text <src>]
           [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>]
           [-moveToLocal [-crc] <src> <localdst>]
           [-mkdir <path>]
           [-touchz <path>]
           [-test -[ezd] <path>]
           [-stat [format] <path>]
           [-tail [-f] <file>]
           [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
           [-chown [-R] [OWNER][:[GROUP]] PATH...]
           [-chgrp [-R] GROUP PATH...]

           [-help [cmd]]

distcp:
hadoop distcp <Source Directory> <Destination Directory>

lsblk :To display block device information
[root@uuuuuuu]# lsblk
NAME                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                                     8:0    0  2.7T  0 disk
├─sda1                                  8:1    0  200M  0 part /boot/efi
├─sda2                                  8:2    0  512M  0 part /boot
├─sda3                                  8:3    0  500G  0 part
│ ├─RootVolGroup00-lv_root (dm-0)     253:0    0   16G  0 lvm  /
│ ├─RootVolGroup00-lv_swap (dm-1)     253:1    0   32G  0 lvm  [SWAP]
│ ├─RootVolGroup00-lv_var (dm-2)      253:2    0   16G  0 lvm  /var
│ ├─RootVolGroup00-lv_opt (dm-3)      253:3    0   15G  0 lvm  /opt
│ ├─RootVolGroup00-lv_home (dm-4)     253:4    0   15G  0 lvm  /home
│ ├─RootVolGroup00-lv_optmapr (dm-5)  253:5    0  200G  0 lvm  /opt/mapr
│ └─RootVolGroup00-lv_optcores (dm-6) 253:6    0  100G  0 lvm  /opt/cores
└─sda4                                  8:4    0  2.2T  0 part


How to Check installed packages:
# rpm -qa | grep hive

To Locate Java:
# locate java | grep bin |less

General Startup on VI
    To use vi: vi filename
    To exit vi and save changes: ZZ   or  :wq
    To exit vi without saving changes: :q!
    To enter vi command mode: [esc]
Counts
     A number preceding any vi command tells vi to repeat  that command that many times.
Cursor Movement
h       move left (backspace)
j       move down
k       move up
l       move right (spacebar) [return]   move to the beginning of the next line
$       last column on the current line
0       move cursor to the first column on the current line
^       move cursor to first nonblank column on the current line
w       move to the beginning of the next word or punctuation mark
W       move past the next space
b       move to the beginning of the previous word or punctuation mark
B       move to the beginning of the previous word,ignores punctuation
e       end of next word or punctuation mark
E       end of next word, ignoring punctuation
H       move cursor to the top of the screen
M       move cursor to the middle of the screen
L       move cursor to the bottom of the screen Screen Movement
G        move to the last line in the file
xG       move to line x
z+       move current line to top of screen
z        move current line to the middle of screen
z-       move current line to the bottom of screen
^F       move forward one screen
^B       move backward one line
^D       move forward one half screen
^U       move backward one half screen
^R       redraw screen ( does not work with VT100 type terminals )
^L       redraw screen ( does not work with Televideo terminals ) Inserting
 r        replace character under cursor with next character typed
R        keep replacing character until [esc] is hit
i        insert before cursor
a        append after cursor
A        append at end of line
O        open line above cursor and enter append mode

Deleting
x       delete character under cursor
dd      delete line under cursor
dw      delete word under cursor
db      delete word before cursor

Copying Code
yy      (yank)'copies' line which may then be put by  the p(put) command. Precede with a count for
      multiple lines.
Put Command brings back previous deletion or yank of lines,words, or characters
P       bring back before cursor
p       bring back after cursor

Find Commands
?       finds a word going backwards
/       finds a word going forwards
f       finds a character on the line under the cursor going forward
F       finds a character on the line under the cursor going backwards
t       find a character on the current line going forward and stop one character before it
T       find a character on the current line going backward and stop one character before it;
repeat last f, F, t, T

Miscellaneous Commands
.               repeat last command
u             undoes last command issued
U             undoes all commands on one line
xp           deletes first character and inserts after second (swap)
J              join current line with the next line
^G          display current line number
%            if at one parenthesis, will jump to its mate
mx          mark current line with character x
'x             find line marked with character x
                NOTE: Marks are internal and not written to the file.

Line Editor Mode
Any commands form the line editor ex can be issued  upon entering line mode.
To enter: type ':
To exit: press[return] or [esc]
ex Commands
For a complete list consult the
UNIX Programmer's Manual

READING FILES
copies (reads) filename after cursor in file currently editing
:r filename

WRITE FILE
:w           saves the current file without quitting

MOVING
:#            move to line #
:$            move to last line of file

SHELL ESCAPE
executes 'cmd' as a shell command.
:!'cmd'

$ scp foobar.txt your_username@remotehost.edu:/some/remote/directory

MapR Regular Commands

Use the following commands to list MapR services
maprcli service list