Webinar Alert : Mastering  Manualand Automation Testing! - Reserve Your Free Seat Now

- Hadoop Blogs -

Hadoop HDFS Commands Cheat Sheet

Welcome to the new article for learning Hadoop. In this article, we will talk about HDFS commands. While learning Hadoop you must have realized HDFS is core component of Hadoop ecosystem. It is distributed files system meant to store file large files. So just like any other file system it also exposes command line interface to interact with. Using these commands, we can read, write, delete files and directory. HDFS commands are very much identical to Unix FS commands. Newer of versions of hadoop comes preloaded with support for many other file systems like HFTP FS, S3 FS. All HDFS commands take resource path as arguments. The full path format is “scheme://authority/path”, where “scheme” keyword is filesystem identifier, For HDFS the scheme is ‘HDFS’, and for the Local Unix file system it is ‘file’. The scheme and authority parameters are optional, when not provided default scheme specified in the core-site.xml is used. A full HDFS path for any file or directory like /user/hadoop/myDir1 can be specified as hdfs://namenodehost//user/hadoop/myDir1 or simply as /user/hadoop/myDir1. To keep things simple, this article mainly focuses on HDFS filesystem. In our journey of Hadoop commands, very first and useful command is ‘help’. This command display help for other commands or list of commands available in the Hadoop shell along with usage guide. If you ever get confuse about any command’s syntax ‘help’ command is quickest and most authentic way to go. Let’s have a look at it. Help Commands:  Access Hadoop Command Manual hadoop Help Commands Now we learned about help command, let’s move to other commands. This article categorizes HDFS commands into 2 categories on the basis of their usage. First try to master “mostly used command” section these set of commands will help you get most of your work done.

Hadoop used commands

ls: It is used for listing files and directories on HDFS


Usage: hdfs dfs -ls [-R] <args>
Example: hdfs dfs -ls /user/hadoop/myDir
Optional: -R argument is used to list directories recursively.

Hadoop used commands copyToLocal: as name suggests, it is used for copying file(s) from HDFS file system to local Unix filesystem. This command works only with files NOT with directories.


Usage: hdfs dfs -copyToLocal <hdfsPath> <localUnixPath>
Example: hdfs dfs -copyToLocal /user/hadoop/myDir/myHadoopFile /home/myuser/mylocalFile

copyFromLocal: as name suggests, it is used for copying file from local Unix filesystem to HDFS file system. This command works only with files NOT with directories.


Usage: hdfs dfs -copyFromLocal <localUnixPath> <hdfsPath>
Example: hdfs dfs -copyFromLocal /home/myuser/myFile /user/hadoop/myDir/myFile
Optional: -f argument will overwrite the destination if it already exists.

cat: it is used for displaying content of HDFS file on console


Usage: hdfs dfs -cat <hdfsPath> [otherhdfsPath …]
Example:  hdfs dfs -cat /user/hadoop/file1
hdfs dfs -cat /user/hadoop/file2 /user/hadoop/file3

cp: it is used for copying files/directories from one HDFS location to another HDFS location


Usage: hdfs dfs -cp [-f] <srcHdfsPath> <destHdfsPath>
Example: hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2
Optional: -f argument will overwrite the destination if it already exists.

get: This command is used to get/download files/directories from HDFS to local file system. This is similar to ‘copyFromLocal’ command, except it works with files as well as directories whereas ‘copyFromLocal’ only works with file.

Read: A Beginner's Tutorial Guide For Pyspark - Python + Spark

Usage: hdfs dfs -get <srcHdfsPath> <localUnixDst>
Example:  hdfs dfs -get /user/hadoop/file /home/user/localfile

put: this command is counterpart of get command, i.e. it used to copy files/directories from local filesystem to HDFS. And it is similar to ‘copyToLocal’ but works with files and directories together


Usage: hdfs dfs -put <localUnixPath> ... <dstHdfsPath>
Example:  hdfs dfs -put localfile /user/hadoop/hadoopfile
hdfs dfs -put localfile1 localfile2 /user/hadoop/myDir

mkdir: it is used for creating directory/directories in HDFS


Usage: hdfs dfs -mkdir [-p] <hdfsPaths>
Example:  hdfs dfs -mkdir /user/hadoop/myHadoopDir
Optional: -p argument creates parent directories along the path

mv: it is used for moving file/directory from one HDFS location to another HDFS location


Usage: hdfs dfs -mv <sourceHdfspath> <destHadoopPath>
Example: hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2

rm: it is used for deleting file from HDFS location. Once file is deleted from its original HDFS location it goes into ‘trash’ HDFS directory.


Usage: hdfs dfs -rm [-f] [-r|-R] [-skipTrash] <hdfsPath>
Example: hdfs dfs -rm /user/hadoop/myHadoopFile
Optional: -f argument will force delete without any warning
-R argument deletes the directory and any file/directory under it recursively.
-r option is equivalent to -R.
-skipTrash option will skip trash directory, and delete the file(s) immediately

rmr: This command is similar to ‘rm’ command,the only difference is that it works with directories.


Usage: hdfs dfs -rmr [-skipTrash] <hdfsDirPath>
Example: hdfs dfs -rmr /user/hadoop/myDir

tail: this command is used for displaying last few lines of HDFS file on console.


Usage: hdfs dfs -tail [-f] <hdfsFilePath>
Example: hdfs dfs -tail /user/hadoop/myHadoopFile

Hadoop useful commands

appendToFile: it is used for appending content of local file(s) into one hdfs file.

Read: Big Data Hadoop Developer Career Path & Future Scope

Usage hdfs dfs -appendToFile <localSourcePath> <hdfsFilePath>
Example:  hdfs dfs -appendToFile mylocalfile /user/hadoop/myhadoopfile

chmod: is used for changing permission of hdfs file/directory.


Usage: hdfs dfs -chmod [-R] <MODE> <hdfsPath>
Example: hdfs dfs -chmod 777 /user/hadoop/myHadoopFile
Optional: -R argument make the change recursively

chown: it is used for changing owner of hdfs file/directory


Usage:  hdfs dfs -chown [-R] [OWNER][:[GROUP]] <hdfsPath>
Example: hdfs dfs -chown devUser:developers /user/hadoop/myHadoopPath
Optional: -R argument make the change recursively

count: it is used to count number of directories and files inside a directory


Usage:  hdfs dfs -count [-q] [-h] <hdfsPath>
The output columns with count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME
Example: hdfs dfs -count /user/hadoop/myHadoopFile
Optional: -h argument shows sizes in human readable format e.g. 64.0m instead of 67108864

du: it is used to displays sizes of files and directories present in the specified directory.


Usage:  hdfs dfs -du [-s] [-h] <hdfsPath>
Example: hdfs dfs -du /user/hadoop/myHadoopFile
Optional:  -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files
-h option will format file sizes in a "human-readable" fashion (e.g 64.0m instead of 67108864)

Hadoop useful commands getmerge: it used to download/get multiple files from one hdfs directory as a single consolidated file onto local filesystem


Usage:  hdfs dfs -getmerge <hdfsDirPath> <localdst>
Example:  hdfs dfs -getmerge /user/hadoop/myHadoopDir/* /home/user/singleLocalFile

moveFromLocal: this command is similar to ‘copyFromLocal’ except it deletes local file after copying to hdfs.


Usage:  hdfs dfs -moveFromLocal <localsrcPath> <hdfsFilePath>
Example:  hdfs dfs -moveFromLocal mylocalFile /user/hadoop/myHadoopFile

moveToLocal: it is similar to ‘copyToLocal’ command except it deletes original hdfs file after copying it to local file system.

Read: An Introduction to Apache Spark and Spark SQL

Usage: hdfs dfs -moveToLocal <hdfsFilePath> <localDestpath>
Example: hdfs dfs -moveToLocal /user/hadoop/myHadoopFile mylocalFile

setrep: it is used for changing replication level of a hdfs file/directory


Usage:  hdfs dfs -setrep [-w] <numReplicas> <path>
Example: hdfs dfs -setrep -w 3 /user/hadoop/dir1
Optional:  -w flag force command to wait for the replication to complete.

stat: it is used to show stats about hdfs file/directory


Usage:  hdfs dfs -stat <hdfsFilePath>
Example:  hdfs dfs -stat /user/hadoop/myHadoopFile

text:  it is more powerful version of ‘cat’ command. cat command can display content if specified hdfs file is text file but ‘text’ command can display content from compressed gzip files also.


Usage: hdfs dfs -text <hdfsFilePath>
Example: hdfs dfs -text /user/hadoop/myHadoopFile.gzip

touchz: it is used to create empty file on hdfs file system


Usage: hdfs dfs -touchz <hdfsFilePath>
Example: hdfs dfs -touchz /user/hadoop/myHadoopFile

fsck: this command is used for checking health of HDFS, it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks.


Usage: hadoop fsck <hdfsFilePath> [-move | -delete | -openforwrite]
Example: hdfs dfs - hdfsFilePath /user/hadoop/myHadoopFile -delete

Conclusion:

We just walked over some HDFS commands which are used on day to day basis while working with hadoop. We learned how to perform basic file system operations and work with files and directories present in HDFS. We also understood how to get files from HDFS to our local file system and other way around. Besides basic filesystem operation we explored a few advanced features of HDFS data management using the command line.

Read: What is Spark? Apache Spark Tutorials Guide for Beginner

     user

    JanBask Training

    A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.


  • fb-15
  • twitter-15
  • linkedin-15

Comments

Trending Courses

salesforce

Cyber Security

  • Introduction to cybersecurity
  • Cryptography and Secure Communication 
  • Cloud Computing Architectural Framework
  • Security Architectures and Models
salesforce

Upcoming Class

5 days 21 Sep 2024

salesforce

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing
salesforce

Upcoming Class

-1 day 15 Sep 2024

salesforce

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL
salesforce

Upcoming Class

5 days 21 Sep 2024

salesforce

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum
salesforce

Upcoming Class

5 days 21 Sep 2024

salesforce

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design
salesforce

Upcoming Class

4 days 20 Sep 2024

salesforce

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning
salesforce

Upcoming Class

5 days 21 Sep 2024

salesforce

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing
salesforce

Upcoming Class

3 days 19 Sep 2024

salesforce

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation
salesforce

Upcoming Class

11 days 27 Sep 2024

salesforce

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation
salesforce

Upcoming Class

12 days 28 Sep 2024

salesforce

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks
salesforce

Upcoming Class

5 days 21 Sep 2024

salesforce

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning
salesforce

Upcoming Class

18 days 04 Oct 2024

salesforce

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop
salesforce

Upcoming Class

11 days 27 Sep 2024

Interviews