Apache Sqoop is a tool designed for efficiently transferring bulk
data in a distributed manner between relational databases RDBS such as
MySQL and Oracle to HDFS and vice versa.
Sqoop import the data from RDBMS to
Hadoop Distributed System (HDFS), transform the data in Hadoop MapReduce, and
then export the data back into an RDBMS.
Sqoop Automates most of this
process,Sqoop uses MapReduce to import and export the data,which provides
parallel operation and fault tolerance.
Sqoop 2
Sqoop2 transfers bulk data between Hadoop and various types of structured datastores,
such as relational databases, enterprise data warehouses, and NoSQL systems. Sqoop2 can be used
in conjunction with Hue, which is a web-based GUI that facilitates the importing and exporting
of data from these structured datastores.
Currently, Sqoop2 does not support all the features of Sqoop1. Refer to the following
table to understand, the differences and determine which version is best for your purposes.
Sqoop2 transfers bulk data between Hadoop and various types of structured datastores,
such as relational databases, enterprise data warehouses, and NoSQL systems. Sqoop2 can be used
in conjunction with Hue, which is a web-based GUI that facilitates the importing and exporting
of data from these structured datastores.
Currently, Sqoop2 does not support all the features of Sqoop1. Refer to the following
table to understand, the differences and determine which version is best for your purposes.
Sqoop2 Architecture:
Differences between Sqoop1 and Sqoop2
Feature
1) Specialized connectors
for all major RDBMS
Sqoop1: Available.
Sqoop2: Not available.
However, you can use the generic JDBC
connector, which has been tested on these databases
MySQL
Microsoft SQL Server
Oracle
PostgreSQL
The generic JDBC connector should also work with any other
JDBC-compliant database,
although specialized connectors probably give better
performance.
2) Data transfer from RDBMS to Hive or HBase
Sqoop1: Done
automatically.
Sqoop2: Must be done manually in two stages:
Import
data from RDBMS into MapR-FS.
Load
data into Hive (using the LOAD
DATA command) or
HBase
3) Data transfer from Hive or HBase to RDBMS
Sqoop1: Must be done manually in two stages:
Extract data from Hive or HBase into MapR-FS,
as a text file or as an Avro file.
Export
the output of step 1 to an RDBMS using Sqoop.
Sqoop2: Must be done manually in two stages:
Extract data from Hive or HBase into MapR-FS,
as a text file or as an Avro file.
Export
the output of step 1 to an RDBMS using Sqoop.
4) Integrated Kerberos security
Sqoop1: Supported.
Sqoop2: Not supported.
5) Password encryption
Sqoop1: Not supported.
Sqoop2: Supported using Derby's data encryption feature
(although the configuration has not been
verified).
Sqoop1 & Sqoop2 Supported databases:
database Connection String
mysql jdbc:mysql://
Oracle
jdbc:oracle://
Teradata jdbc:teradata://
Topics Covered in this Article:This blog is mostly notes for my self study and took an
information from many links and a big data enthusiast blogs.
a) Versions Covered (MySQL,Oracle & Teradata)
b) Sqoop Installation
c) Download and save ODBC drivers for MySQL,Oracle & Teradata.
d) Sqoop list commands.
e) Import/Export data from RDBMS to HDFS and Vice Versa.
Versions Covered:
Sqoop
1.4.2
MySQL 5.0+
Oracle
10.2.0+
Teradata
No comments:
Post a Comment