Our Support: During the COVID-19 outbreak, we request learners to CALL US for Special Discounts!

- Hadoop Blogs -

Hadoop Command Cheat Sheet - What Is Important?

This is a well-known fact that Hadoop has become one of the popular and most used tools to handle big data. Though when people say Big Data then it may not be clear that what will be its size? But Big data were evolved to solve the problems associated with the huge amount of data.

Traditionally, data handling tools were not able to handle the vast amount of data but Hadoop and Big Data solved this problem. It has emerged as an effective tool which can not only handle big data instead in minimum time it can provide analytical result too.

This article is about Hadoop and the commands used to handle big data. As to master this framework you may need to master a few commands, so we will see here the commonly used commands of Hadoop. They are also known as Hadoop Distributed File System Shell Commands.

Features and Introduction to Hadoop

Hadoop framework is basically designed to handle a large volume of data both structured and unstructured. It can handle more structured and unstructured data, unlike traditional data warehouse. Traditionally, all of the important and useful data were ignored as the technology was not that much more efficient and other tools were also not there.

Hadoop is used for those data sources which are not structured, but whose information is highlyvaluable for the decision-making process of management. As Hadoop is a cost-effective tool and it can dramatically increase the organizational efficiency even if the data grows exponentially in an unstructured manner.

Hadoop has following organizational beneficial features:

  • Flexibility in data processing
  • Easily scalable
  • Fault-Tolerant
  • Great in processing data at the faster speed
  • Robust Ecosystem
  • Cost Effective

In any organization, only 20% of data is structured while rest is in an unstructured form whose value is generally ignored. But Hadoop is quite flexible to handle both types of data. Being scalable platform new nodes can be easily created in Hadoop, which can help in processing huge amount of data.

Being fault-tolerant, data can be easily accessed even if any data node fails. Here, data is automatically replicated that makes Hadoop a completely reliable platform. It takes minimum time to process the huge amount of data due to batch and parallel processing techniques used in Hadoop.

Read: An Introduction and Differences Between YARN and MapReduce

A robust Hadoop ecosystem can handle the analytical needs of Hadoop development for small or large organizations. Hadoop tools can handle the variety of data, these tools include MapReduce, Hive, HCatalog, Zookeeper, ApachePig, and many more. As it is an open source framework, so it can provide parallel computing at no or minimal costs.

Shell Commands of Distributed Hadoop File System     

Hadoop Shell has a number of commands that can run directly from the command prompt of your operating system. Theses Hadoop shell commands are of following two types:

  1. One is about File Manipulation
  2. Other is for Hadoop Administration

Hadoop File Automation Commands

The following commands are generally used, you can also find the list of all commands on the Apache website. Let us discuss on Hadoop file automation commands one by one -

  • cat: This command is used to copy source path to the destination or standard output.

syntax: hdfsdfs –cat URI [URI- - -]

  • chgrp: It is used to change the group of the files. The user must be superuser or file owner to use this command.

Syntax: hdfsdfs –chgrp [-R] GROUP URI [URI---]

  • chmod: This command is used to change the file permission. Here in this command –R is used to change recursively by directory structure. Here again the user must either be a file owner or superuser to use this command.

Syntax: hdfsdfs –chmod [-R] <MODE[,MODE]- - -: OCTALMODE> URI [URI - - -]

  • chown: Here user can change the file owner through this command and –R is used to make recursive change by the directory structure and only file owner or superuser can use this command.

Syntax: hdfsdfs –chown [-R][OWNER][:{GROUP]]URI[URI]

  • Count: Used to count the number of directories, files and bytes for the path that matches the specified pattern.

Syntax: hdfs dfs –count [-q] <paths>

Read: Scala Tutorial Guide for Begginner
  • Cp: This command can copy one or more than one file from source path to the destination path. In case, if more than one source path is used in the command then destination path must be a directory.

Syntax: hdfsdfs –cp URI[URI - - -]<dest>

  • Du: It can display the size of directories or files, which are contained in any specific directory. Putting –s option will display an aggregate size of all the files, rather than displaying size of each individual file. –h option can convert the file in human readable mode.

Syntax: hdfsdfs –du [-s][-h]URI [URI - - -]

  • Get: This command can copy the files to the local file system. Through CRC check data transmission errors can be detected. Option –ignorecrc can copy even those files, which fails CRC

Syntax: hdfs dfs –get[-ignorecrc][-crc]<src><localdst>

  • Ls: it is used to know or dislay the statistics of any specified file or directory

Syntax: hdfsdfs –ls <args>

  • Mkdir: this command is used to create one or more directories on the specified It is quite like windows and Unixmkdir command

Syntax: hdfsdfs –mkdir<path>

  • Mv: It is used to move one or more files from one location to another. Again if more than one sources are used in the command then destination must be a directory. The user cannot move any file from one file system to another.

Syntax: hdfs dfs –mv URI[URI - - -]<dest>

  • Put: This command is used from one file system to another. It can also read the file input from standard input and send it to the destination file system.

Syntax: hdfsdfs –put<localsrc>- - -<dest>

  • Rm: It can delete one or more files. Empty file or directory cannot be deleted by this command. You can also skip Trash by using the option –skipTrash

Syntax: hdfsdfs –rmr[-skipTrash]URI[URI- - - ]

Read: Difference Between Apache Hadoop and Spark Framework
  • Stat: It is used to display the information of any specific path.

Syntax: hdfsdfs –stat URI[URI - - -]

Hadoop Administration Commands

As described above Hadoop has two types of commands, so any Hadoop administrator must know all administrative commands. Some of the most used and important Hadoop administrative commands are:

  • Balancer: To run cluster balancing utility
  • Daemonlog: To get or set the log level of each daemon
  • Dfsadmin:To run many HDFS administrative operations
  • Datanode:To run HDFS datanode service
  • Mradmin: To run a number of MapReduce administrative operations
  • Jobtracker: To run MapReduce job tracker node
  • Namenode:To run namenode
  • Tasktracker: To run MapReduce TaskTracker Node
  • Secondary namenode: To run secondary namenode

Among above-listed commands, each command has its own specific purpose and can only be used by Hadoop administrators.

Final Words:

Summarizing all of the above-listed facts of HDFS, it can be said that user can easily handle Hadoop through just command line prompt and need not to any specific interface. Hadoop HDFS commands are much more powerful and possess lots of abilities. It is considered a useful platform worldwide and this is the popularity of platform that it has increased chances of jobs too for the learner. If you also wanted to give a new boost to your career then join Janbask’s Hadoop training program right away.




    Janbask Training

    A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.


Comments

Trending Courses

AWS

  • AWS & Fundamentals of Linux
  • Amazon Simple Storage Service
  • Elastic Compute Cloud
  • Databases Overview & Amazon Route 53

Upcoming Class

-1 day 14 Jul 2020

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing

Upcoming Class

16 days 31 Jul 2020

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning

Upcoming Class

1 day 16 Jul 2020

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation

Upcoming Class

2 days 17 Jul 2020

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL

Upcoming Class

0 day 15 Jul 2020

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing

Upcoming Class

9 days 24 Jul 2020

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum

Upcoming Class

-1 day 14 Jul 2020

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design

Upcoming Class

0 day 15 Jul 2020

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation

Upcoming Class

8 days 23 Jul 2020

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks

Upcoming Class

-1 day 14 Jul 2020

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning

Upcoming Class

2 days 17 Jul 2020

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop

Upcoming Class

8 days 23 Jul 2020

Search Posts

Reset

Receive Latest Materials and Offers on Hadoop Course

Interviews