Wednesday 24 July 2013

Apache Sqoop for importing data from a relational DB

Sqoop


Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.
Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.

This document describes how to get started using Sqoop to move data between databases and Hadoop and provides reference information for the operation of the Sqoop 
command-line tool suite. 

This document is intended for:
  • System and application programmers
  • System administrators
  • Database administrators
  • Data analysts
  • Data engineers


Using Apache Sqoop for Data Import from Relational DBs

 

Apache Sqoop can be used to import data from any relational DB into HDFS, Hive or HBase.
To import data into HDFS, use the sqoop import command and specify the relational DB table and connection parameters:
sqoop import --connect <JDBC connection string> --table <tablename> --username
                                            <username> --password <password>
This will import the data and store it as a CSV file in a directory in HDFS.
To import data into Hive, use the sqoop import command and specify the option ‘hive-import’.
sqoop import --connect <JDBC connection string> --table <tablename> --username
                               <username> --password <password> --hive-import

 

http://www.bigdatatraining.in/hadoop-development/training-schedule/
Mail:
info@bigdatatraining.in

Call:
+91 9789968765
044 - 42645495

Visit Us:
#67, 2nd Floor, Gandhi Nagar 1st Main Road, Adyar, Chennai - 20
[Opp to Adyar Lifestyle Super Market]

No comments: