Cyber Monday Deal : Flat 30% OFF! + free self-paced courses  - SCHEDULE CALL

- Hadoop Blogs -

How to Install Apache Pig on Linux?

You might be here because you wanted to install Apache Pig on Linux. Thankfully, you have been reached to the right place today where step by step guide is given to complete the installation process precisely. Before we jump to the installation part directly, let us start with a brief introduction to Apache Pig first.

Apache Pig is a platform which is used to create and execute the programs of MapReduce that are utilized in Hadoop. Large data sets are analyzed through this tool. Generally speaking, Apache Pig can be said an abstraction over MapReduce.

The language used in Apache Pig is Pig Latin and the novice programmers use this language for executing and writing MapReduce programs, especially those programmers who are not good in Java language.

Pig Latin is basically a high-level language which is used for Apache Pig platform. The programs written in Apache Latin can be run on any platform, even over distributed database environment of the Hadoop File System or HDFS.

The scripts of Apache Pig are written in Pig Latin language and they are later converted to MapReduce job. There are various operators present in Apache Pig and they can be used to read, write and to process the data. These operators are called Apache Pig Relational Operators.

Apache Pig Installation on Linux Platform

Apache Pig is one of the most popular tools that is frequently used by the Hadoop developers. If you want to install Apache Pig on Linux platform, then you need to follow proper steps for successful installation of Apache Pig on Linux.

Are there any Pre-Requisites?

To install Apache Pig on Linux platform, there are many pre-requisites. First of all, Hadoop and Java must be pre-installed on your machine. There is a complete procedure and set of steps to install Hadoop and Java on any Linux-based machine, you can follow these steps or Google the proper steps or installation guide for these tools.

After downloading it from the site, install Java and Hadoop and prepare the environment so that you can install Apache Pig properly on your machine. The link from where you can access the complete installation guide for Java and Hadoop is available on Google.

How to download Apache Pig online?

To download Apache Pig, you must select the version which you want to use for your requirement or you can download the latest available version of this tool. There is complete information available online about the various available versions of this platform and their compatibility with other platforms. Download the appropriate package for your need and install them later on. The MapReduce Accelerator support information is listed below:

Read: A Complete List of Sqoop Commands Cheat Sheet with Example

Apache Pig versions which are supported by MapReduce Accelerators are:

  • Apache Pig 0.8.1 (A compatible release of Apache Hadoop versions 0.20.x)
  • Apache Pig 0.9.2 (A compatible version for Apache Hadoop 1.0.0, 1.0.1 and 1.1)

You can take a note of your compatible version of Apache Hadoop, which you are using for Pig scripts. Apache Pig releases do not support Apache Hadoop default version that is 0.21.0, so one must download the supported version of MapReduce Accelerator. The Apache Pig version value is to be inserted as HADOOP_VERSION value while installing it on Linux platform.

Steps to be followed for installation of Apache Pig on Linux

Below is listed the procedure of installing Apache Pig on Ubuntu 16.04 version:

Step 1: From the link http://www-us.apache.org/dist/pig/pig-0.16.0/pig-0.16.0.tar.gz, you can download the latest version of Apache Pig. There will be an archived file with the .tar extension, you just need to download that particular file through the following command: Command: wget (site link) or http://www-us.apache.org/dist/pig/pig-0.16.0/pig-0.16.0.tar.gz

Step 2: Using tar command, you can extract the zipped file. Below is the complete syntax to extract and list the file -

Command: tar –xzf pig-0.16.0.tar.gz

Command: ls

Step 3: You can update the environment variables of Apache Pig through “.bashrc” file. Here in this example, the value of this variable is set in the way so that it can be updated from any directory and to execute Pig command the user would not have to access pig directory each time.

Moreover, one can easily know the path of Apache Pig file, if some application wanted to know that. Following is the command to set the environment variable:

Read: A Comprehensive Hadoop Big Data Tutorial For Beginners

Command: studio gedit .bashrc

You can add following commands at the end of the file: How to Install Apache Pig on Linux?

Before updating the environment variable for Pig make sure that Hadoop path is also set.

Step 4: Now, this is the time to check the version of Pig just to ensure that Apache Pig has been installed properly. In case, if the version is not displayed properly then you may have to repeat the above-listed steps again.

Command: pig –version

Step 5: Check Pig help option to list all the commands available under the tab-

Pig -help

Step 6: To start the grunt shell, you should run Pig. Pig Latin script is run through Grunt Shell. Command: pig. Congratulations! The Apache Pig is installed correctly on your Linux OS. Now you must be able to see two execution modes where it can run –

MapReduce Mode and the Local Mode.

Read: ELK vs. Splunk vs. Sumo Logic – Demystifying the Data Management Tools

Let us discuss on each of the modes briefly.

Apache Pig Execution Modes

Following two modes are available to execute Apache Pig on Linux environment:

  1. MapReduce Mode: MapReduce is the default available execution mode of Apache Pig, it required HDFS installation and Hadoop cluster as its component. As it is the default mode, so the user does not need to specify –x flag. On local file system, input and output modes are also present.
  2. Local Mode: All files are run and installed using the file system of the local host on a single machine. The local mode is usually specified using by –x flag. Input and output both are present on the local machine and the following command is used for this purpose:Command: pig –x local

Apache Pig Components

There are various components of Apache Pig framework. The major components of Pig are listed below:

  • Parser: Pig scripts are usually handled by the Parser wheremany types and various other checks are performed. The output of the parser will be DAG or direct acyclic graph, through which logical operators and Pig Latin statements are represented. Here the operators are represented as nodes and data flows are represented through the edges.
  • Compiler: The optimized logical plan is compiled in various steps of MapReduce jobs.
  • Optimizer: The logical optimizer usually carries and execute the logical plan like projection and pushdown.
  • Execute Engine: The MapReduce jobs are submitted to Hadoop in a sorted order and the desired result is produced by the Hadoop engine.

Final Words:

Hadoop is a popular framework used by database experts and there are a number of tools and technologies present in this framework. Due to the popularity of this platform, there are a number of jobs options available in the market and most of the developers are using this platform for Linux or any other operating system.

Here the Apache Pig framework is the best tool for the Linux OS. Installation of this framework is a step by step procedure, which is not that much difficult but in order to use it in the proper way the user will have to install it properly.

The above list of steps or installation guide has a detailed procedure and the user can easily follow this step by step process if he wants to use Apache Pig or Pig Latin script language of this platform to leverage the Hadoop platform. Apache website has the link to download the framework and the user can easily download it right from there. Even there is a help guide also available for his platform.

To learn more about Hadoop and Apache, you should start with the Training and Certification program at JanBask right away. The right learning not only boosts your knowledge and gives a new direction to your career that is actually required to become successful and established in your near future.

Read: Chief Elements Of A Professional Hadoop Resume In 2023


fbicons FaceBook twitterTwitter lingedinLinkedIn pinterest Pinterest emailEmail

     Logo

    JanBask Training

    A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.


  • fb-15
  • twitter-15
  • linkedin-15

Comments

Trending Courses

Cyber Security Course

Cyber Security

  • Introduction to cybersecurity
  • Cryptography and Secure Communication 
  • Cloud Computing Architectural Framework
  • Security Architectures and Models
Cyber Security Course

Upcoming Class

3 days 14 Dec 2024

QA Course

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing
QA Course

Upcoming Class

9 days 20 Dec 2024

Salesforce Course

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL
Salesforce Course

Upcoming Class

3 days 14 Dec 2024

Business Analyst Course

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum
Business Analyst Course

Upcoming Class

3 days 14 Dec 2024

MS SQL Server Course

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design
MS SQL Server Course

Upcoming Class

2 days 13 Dec 2024

Data Science Course

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning
Data Science Course

Upcoming Class

3 days 14 Dec 2024

DevOps Course

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing
DevOps Course

Upcoming Class

6 days 17 Dec 2024

Hadoop Course

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation
Hadoop Course

Upcoming Class

9 days 20 Dec 2024

Python Course

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation
Python Course

Upcoming Class

10 days 21 Dec 2024

Artificial Intelligence Course

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks
Artificial Intelligence Course

Upcoming Class

3 days 14 Dec 2024

Machine Learning Course

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning
Machine Learning Course

Upcoming Class

16 days 27 Dec 2024

 Tableau Course

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop
 Tableau Course

Upcoming Class

9 days 20 Dec 2024

Search Posts

Reset

Receive Latest Materials and Offers on Hadoop Course

Interviews