Today's Offer - Hadoop Certification Training - Enroll at Flat 10% Off.

- Hadoop Blogs -

An Introduction and Differences Between YARN and MapReduce

Difference Between YARN and MapReduce

Hadoop developers are very much familiar with these two terms, one is YARN and other is MapReduce. Though some newbies may feel them alike there is a huge difference between YARN and MapReduce concepts. Where one is an architecture which is used to distribute clusters, so on another hand Map Reduce is a programming model.

This article is written to give you a detailed explanation of both the concepts and a short comparison between the two. YARN is also known as dummy resource scheduler and MapReduce involve a process to decide that what should be done with any resource?

An Introduction to YARN

YARN is included in Hadoop 2.0, it is basically used to separate processing components and resource management process. YARN is given to provide an advantageous platform or an option for distributed processing layer, used in earlier versions of Hadoop. YARN is known as:

  • Not a cluster manager buta Resource Manager,
  • Instead of short-lived and dedicated job tracker, it is known as ApplicationMaster,
  • Not a Task Tracker instead a Node Manager,
  • Not a MapReduce job but a distributed application.

YARN has the following architecture as shown below:

In the above-shown YARN architecture, there is a global resource manager which runs as a master daemon, it tracks the total live nodes and resources on the cluster and manages the allocation task of these resources. It works in a multi-tenant, secured, and shared manner. YARN vs MapRecude If we talk about the complete process of its execution then on submission of an application, the lightweight process ApplicationMaster coordinates execution of the applications. The task of this Application Manager is to monitor, restarting, running, and slowing the tasks. All tasks related to its applications are controlled by the Node Manager.

Node Manager is an efficient version of Task Tracker, even it has dynamically created resource containers. Size of the container may vary from one application to another and it depends on the certain factors like size of memory, CPU, and network I/O. Nowadays MRv1 runs on the top of YARN.

Read: How to install Hadoop and Set up a Hadoop cluster?

Application Running Process in YARN

YARN vs MapRecude As per above diagram, the execution or running order of an Application is as follow:

  • A Resource Manager is asked to run an Application Master by the Client
  • Resource Manager when receives the request, then it searches for Node Manager to launch ApplicationMaster in the container. Once the request is completed, the result is returned.
  • As per the requirement, more containers can be requested from Resource Manager
  • A MapReduce and distributed computation aremade to run in the end.

Life Span of a YARN application

  • The lifespan of a YARN application can range from a few seconds to a few months
  • It can be like one application per job (MapReduce)
  • It can be One application per workflow for this:
    • Containers can be reused
    • Intermediate data is cached between jobs
    • Tez and Spark are the examples
  • Long Running applications which can be shared among many people
    • It may act as a Coordinator
    • A long-running master to launch other applications
    • Apache Impala runs proxy applications and can reduce the overhead of Application Master

Introduction to MapReduce

MapReduce framework is used to write applications which can process a large amount of structured and unstructured data. The data processed by these applications are stored in HDFS. MapReduce is basically used for batch processing which may include petabyte and terabyte of Apache Hadoop data. MapReduce offers following listed benefits:

Listicle Benefits Description
Simple to Use Since in MapReduce the developers can write the application in any language like Java, C, C++ or Python, it is easy for developers to run Map-Reduce jobs.  
Scalable Applications MapReduce can process petabytes of data, which is stored on HDFS cluster.  
Fast MapReduce can solve the problems which may take a number of days in solving and even they can be solved by MapReduce in several hours or minutes.  
Easy to Recover If in case of any failure copy of data is unavailable then in MapReduce the data can be taken from another machine, which will have a similar copy with same key/value pair and it can then be used to solve sub-task. JobTracker is used to keep track of these problems.  
Minimal data movement In MapReduce, the complete process of computation is moved to HDFS and the task of processing can occur on physical nodes itself where the data resides. In this way, network I/O patterns are also reduced and Hadoop processing speed is increased significantly.

  MapReduce is the core building block of Hadoop framework, it allows parallel and distributed processing of data in huge amount. It consists of the following tasks and components:

  • MapReduce has two tasks, one is to Map and other is to Reduce.
  • In MapReduce, the reduce phase is executed after completion of mapper phase.
  • In Map process, data blocks are read out then processed carefully through which key-value pairs are produced as intermediate output.
  • The output of Mapper phase becomes the input of Reducer.
  • Reducer can receive the inputs from more than one
  • Reducer then aggregated the intermediate data tuples and generates key-value pairs as the final output.

Advantages of MapReduce

MapReduce has the following advantages that you should know –

1). Parallel Processing In MapReduce, the full job is divided into multiple nodes and they are processed in a parallel manner simultaneously. So, it works basically in divide and conquers manner and the data is processed among multiple machines in a parallel manner. As the processing is done in a parallel manner, so the processing time is reduced drastically.

Read: Hbase Architecture & Main Server Components

2). Locality of Data Instead of moving data for processing, in MapReduce, the complete process is moved to each node. As now the data is available in a huge amount so it may become difficult to move it from one place to another and therefore this technique is considered as beneficial and the best one.

It offers the following advantages:

It is quite cost-effective to move processing unit from one node to another rather than moving data.

  • Processing time is reduced drastically as more than one node takes part in processing.
  • No node gets overburdened as many nodes take part in processing data.

Difference Between YARN and MapReduce

After discussing YARN and MapReduce, let’s see what are the differences between YARN and the MapReduce?

YARN has following components to process a task:

  1. Job Tracker
  2. Task Tracker
  3. Slot

MapReduce has following components to process a task:

  1. Resource Manager
  2. Timeline
  3. Application Master
  4. Node Manager
  5. Container

As listed, above are the different components used to process any task or job in YARN and MapReduce.Though they are completely separate concepts, the user can easily see and check the advantages of both the concepts which are used in data processing.

Read: Key Features & Components Of Spark Architecture

Scalability. availability, utilization, and multitenancy are a few other factors to compare the performance of these systems. Where YARN is just a Resource manager so MapReduce is the process to distribute the data processing task and to manage the complete task. A set of resources is used in MapReduce for the complete task. Resource allocation is a subpart of MapReduce jobs.

Final Words:

Today, Hadoop is a huge platform and is used by many organizations to process the big or huge amount of data. MapReduce and YARN are just two concepts which are part of huge data processing.

Hadoop developers get many advantages of this platform and the complete architecture become quite simple and easier due to its processing way and the ability to process the huge amount of data.

Hadoop data processing involve many steps to process data YARN and MapReduce processes make the complete processing faster and efficient. As the use of parallel and distributed processing makes the task easier.

Read: Difference Between Apache Hadoop and Spark Framework

    Janbask Training

    JanBask Training is a leading Global Online Training Provider through Live Sessions. The Live classes provide a blended approach of hands on experience along with theoretical knowledge which is driven by certified professionals.


Trending Courses

AWS

  • AWS & Fundamentals of Linux
  • Amazon Simple Storage Service
  • Elastic Compute Cloud
  • Databases Overview & Amazon Route 53

Upcoming Class

4 days 24 Nov 2019

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing

Upcoming Class

5 days 25 Nov 2019

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning

Upcoming Class

5 days 25 Nov 2019

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation

Upcoming Class

6 days 26 Nov 2019

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL

Upcoming Class

14 days 04 Dec 2019

Course for testing

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL

Upcoming Class

34 days 24 Dec 2019

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing

Upcoming Class

13 days 03 Dec 2019

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum

Upcoming Class

5 days 25 Nov 2019

SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design

Upcoming Class

8 days 28 Nov 2019

Comments

Search Posts

Reset

Receive Latest Materials and Offers on Hadoop Course

Interviews