Sqoop has become a popular tool among Big data developers used to fetch relational data from the RDBMS. Since the time when Hive, HBase, Cassandra, Pig, and MapReduce came into existence, developers felt the need of having a tool that can interact with RDBMS server to import and export the data.
Sqoop means “SQL to Hadoop and Hadoop to SQL”. The tool is designed to transfer data between relational database servers and Hadoop. This post is written to provide you an introduction to Sqoop and the way in which it imports and export data.
To import data from relational database servers and Hadoop Sqoop tool is used. To import data from relational databases like Oracle, MySQL, HDFS or Hadoop file system to relational databases developers can use Sqoop tool.
Usually, applications interact with databases by the RDBMS and so RDBMS generate a huge amount of data and is considered as a great data source. In Hadoop Ecosystem, Sqoop plays an important role. As told above that the tool is used to transfer data from Hadoop to RDBMS and vice versa. Even bulk data can also be transferred easily between Hadoop and external data sources like relational databases, data warehouses, etc.
The user can also import data from external data sources like HBase or Hive. Sqoop import and export operations that executed through commands and described in the following sections of this blog post.
The import tool is used by Sqoop to import data from RDBMS to HDFS and in Sqoop each row is treated as a record. Each task that is assigned to Sqoop is divided into subtasks that are handled by the individual Map Task internally. Map task is just a subtask that imports data to the Hadoop Ecosystem and here all Map tasks import all the data. Sqoop exports command also work in a similar manner. Sqoop export tool exports a set of files from HDFS to the RDBMS, the input files of Sqoop contains records that are also called the rows of a table.
In Sqoop, there is a list of commands available for each and every task or subtask. Here, in the cheat sheet, we are going to discuss the commonly used cheat sheet commands in Sqoop. The commands are used for the following purposes:
To transfer entire table from source to destination, we can use the following command:
Sqoop import\--connect jdbc:mysql://mysql.example.com/sqoop\ username sqoop\password sqoop\table regions
To specify custom output directories basically you can use two parameters:
Directories: -target-dir and –warehouse-dir. -target-dir option imports the real data to the specified directory. Above listed two commands can be used in the following ways: Sqoop import \ connect JDBC:Mysql://mysql.example.com/sqoop\usernae sqoop\password sqoop\ table regions\target-dir/etc/input/cities -warehouse-dir option creates a table name directory and data is imported in that directory, like shown below: Sqoop import\-connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\warehouse-dir/etl/input/
We can specify more than one condition but without using any aggregated query we can perform the operation:
Sqoop import\connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\ where “country=’USA’ ”
Following command can read the password from the standard input, even we can also read a command from a file. The commands for these tasks are:
Sqoop import\connect JDBC: Mysql://mysql.example.com/sqoop\username sqoop\table regions\ -P To read a password from a file: Sqoop import\connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\ password-file my-sqoop-password
If you want to import a file from any file other than a CSV file then the following command should be used by you:
Sqoop import\connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\as-sequence file
To import data into the compressed format we can use following commands. Apart from this, the compression format of data can also be changed for this purpose another set of command is used that is also listed below:
Sqoop import\connect JDBC:Mysql://mysql.example.com/sqoop\username sqoop\password sqoop\table regions\compress To change compression format: Sqoop import –cmpress\--compress-codec org.apache.hadoop.io.compress.BZip2Codec
If you have to import more than one table to your database then the following command can be used:
Sqoop import-all-tables\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\password sqoop\table cities\ OR Sqoop import-all-tables\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\table cities\ --exclude-tables cities,countries
In order to transfer data from the database to Hadoop you may have to use more than one set of commands that are listed below:
Sqoop export\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --export-dir regions Sqoop export\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --update-key id Sqoop export\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --update-key id\--update-mode alloy insert
If you want to import data directly to Hive tool then use following import command
Sqoop import\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --hive-import
To import data to HBase rather than Hive you can use the following set of commands:
Sqoop import\--connect JDBC:Mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\--table regions\ --HBase-table regions\--column-family world
In this way, we can use the number of commands with Sqoop and the list is not limited instead it is quite long. Some more Sqoop commands are listed below:
There is an option in Sqoop to use import command in an incremental manner the imported rows are newer than previously imported rows. The two types of incremental support that are available in Sqoop are:
If you want to add the newly imported rows in continuation with the previous ones and the row-id is increased in an incremental manner. In this command, you can also specify the row id with –check-column. Only those values will be imported that have the values greater than the specified value.
An alternate way to import data that is specified by Sqoop is last modified mode. This may be used when the rows of source table can be updated. This command will set the value of last modified column to the current timestamp.
Sqoop import –connect jdbc:mysql://localhost/regions –username sqoop – table regions –target-dir/Latest Regions –incremental append –check-column region id –last-value 544444
You can check and enlist the databases that are present in Sqoop. For this Sqoop list-databases tool uses SHOW DATABASES query to parse and execute the command against database server. The command is listed below:
Sqoop list-databases\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop
You can use following query to check the tables that are stored in the database server. For this purpose, Sqoop uses a specific command that is SHOW TABLES. Following command is used to enlist the tables:
Sqoop list-tables\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop
If you want to import just a single table from the database then you can use the below-listed command for that:
Sqoop list-databases\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop/--password sqoop/table regions –target –dir/user/sqoop/regions
Here, in this command, you will have to specify: -table: Name of the source table -target-dir: location where you will copy the data of table
If you want to encode the null value than you can use the following command:
Sqoop import\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop\ -password sqoop\--table cities\--null-string’\\N’ \ --null-non-string’\\N’
Sqoop import\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\table regions\incremental append\--check-column id\--last-value20 Sqoop import\--connect jdbc:mysql://mysql.example.com/sqoop\--username sqoop\--password sqoop\table regions\incremental lastmodified\--check-column last_update_date\--last-value “2018-06-11 00:01:01”
Sqoop is being used for data transfer between data source and destination and it offers many advantages to the user. A number of features that are present in Sqoop make it popular. Above listed commands are not limited instead there are a number of commands that can provide many operations that are necessary for data transfer. Due to above-listed commands following listed features have become important and useful among developers. Big data users use Sqoop due to these features that are beneficial for them:
We can say that if we want to execute the data transfer operations smoothly then we should use Sqoop. It is quite popular among Big Data developers just because it can provide a number of beneficial features and just through a single command, you can perform many tasks or sub-tasks. For Hive or HBase you can also do the same operations conveniently. For hands-on expertise on all Sqoop cheat sheet commands, you should join Hadoop certification program at JanBask Training right away.
A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.
MS SQL Server
Receive Latest Materials and Offers on Hadoop Course