Our Support: During the COVID-19 outbreak, we request learners to CALL US for Special Discounts!

- Hadoop Blogs -

Apache Pig Interview Questions & Answers

Pig is a high-level scripting language. The Pig is used with Apache Hadoop. Pig helps the developers to write complex data transformations without having knowledge in Java. The Developers familiar with scripting languages and SQL find Pig's simple SQL-like scripting language very appealing. Pig's SQL-like scripting language is known as Pig Latin. Pig Storage is the default load function in Pig. Whenever the load data has to be used from a file system into the pig, pig storage comes into the picture. In this scenario, one can also specify how the fields in the record are separated along with the schema and the type of the data.

Apache Pig Interview questions for Freshers

  1. What is Apache Pig?
  2. What are relational operations in Pig Latin?
  3. What is Pig Latin?
  4. What is Pig Engine?
  5. Define the modes of Pig Execution?
  6. Define the Pig Latin Features?
  7. What are the advantages of using Pig over MapReduce?
  8. Differentiate between Pig Latin and HiveQL?
  9. What are the common features in Pig and Hive?
  10. Differentiate between logical and physical plans?

Apache Pig Interview questions for experienced

  1. What is the role of MapReduce in Pig programming?
  2. How many ways can we run Pig programs?
  3. Explain Grunt in Pig and explain its features?
  4. Explain bag?
  5. What are the categories and which one is most common? What are the scalar data types in pig?
  6. Why should we use ‘group' keyword in pig scripts?
  7. Why should we use ‘orderby' and ‘distinct' keyword in pig scripts?
  8. Is it possible to join multiple fields in pig scripts?
  9. Is it possible to display the limited no of results?

Apache Pig interview questions and answers

Before we answer the questions, there are some key points to remember about Apache Pig:

  • Apache Pig is a platform, used to analyze large datasets that are represented as data flows.
  • Apache Pig is designed such that it reduces the complexities of writing a MapReduce task using Java programming.
  • Performing data manipulation operations using Apache Pig becomes very easy in Hadoop.
  • Pig Latin language and the Pig Run-time Environment are the main components of Apache Pig, using which Pig Latin programs are executed.
  • Apache Pig follows ETL (Extract Transform Load) process.
  • Apache Pig in case of unstructured data can handle inconsistent schema.
  • Apache Pig handles all kinds of data and performs automatic optimization ie, automatically optimizes the tasks before execution.
  • User Defined Functions (UDF) can be written in different languages like Java, Ruby, Python, etc. and get them embedded in Pig script.
  • Pig allows programmers to write custom functions.
  • Various built-in operators like join, sort, filter, etc. are provided in Pig Latin to read, write, and process large data sets.

Apache Pig interview questions and answers Freshers

Q1). What is Apache Pig?

Apache Pig is an Apache Software Foundation project, which is used for high-level language analyzing large data sets that consists of expressing data analysis programs. For executing data flows in parallel on Hadoop, Pig serves as an engine.

Q2). Name the relational operations in pig Latin?

Rational operations are

  • for each
  • order by
  • filters
  • group
  • distinct
  • join
  • limit

Q3). What is Pig Latin?

Pig Latin is the Scripting Language for the data flow that defines large data sets. A Pig Latin program consists of a series of operations, which is then applied to the input data in order to get the required output.

Q4). What is Pig Engine?

Pig Engine is the platform to execute the Pig Latin programs. Pig engine converts Pig Latin operators into a series of MapReduce job.

Read: YARN- Empowering The Hadoop Functionalities

Q5). Define the modes of Pig Execution?

Pig execution can be done in two modes.

  • Local Mode: Local execution in a single JVM, all files are installed and run using local host and file system.
  • MapReduce Mode: Distributed execution on a Hadoop cluster, it is the default mode.

Q6). Define Pig Latin Features?

  • Pig Latin script is made up of a series of operations, or transformations, that are applied to the input data in order to fetch output.
  • Programs can be executed either in Interactive mode through Grunt shell or in Batch mode via Pig Latin Scripts.
  • Includes operators for a lot of the traditional data operations.
  • User Defined Functions (UDF)
  • Debugging Environment

Q7). What are the advantages of using Pig over MapReduce?

In MapReduce, The development cycle is very long. Writing mappers and reducers, compilingand packaging the code, submitting jobs, and retrieving the results is a time consuming process. Performing Dataset joins is very difficultLow level and rigid, and leads to a great deal of custom user code that is hard to maintain and reuse is complex.

In Pig, Compiling or packaging of code need not be done in Pig. Internally the Pig operators will be converted into the map or reduce the tasks.All of the standard data-processing operations are provided by Pig Latin, high-level abstraction for processing large data sets is possible.

Q8). Differentiate between Pig Latin and Hive QL?

Pig Latin:

  • Pig Latin is a Procedural language
  • Nested relational data model
  • Schema is optional

HiveQL:

  • HiveQL is Declarative
  • HiveQL flat relational
  • Schema is required

Q9). What are the common features in Pig and Hive?

  • Provides a high-level abstraction on top of MapReduce.
  • Converts command internally into MapReduce jobs.
  • Does not support low-latency queries.
  • Does not support OLAP or OLTP.

Q10). Differentiate between logical and physical plans?

When a Pig Latin Script is converted into MapReduce jobs, Pig passes through some steps. After performing the basic parsing and semantic checking, Pig produces a logical plan. The logical operators are described by the logical plan that is executed by Pig during execution. After this, Pig produces a physical plan. The physical operators that are needed to execute the script are described by the physical plan.

Read: HBase Interview Questions And Answers

Apache Pig interview questions and answers Experienced

Q1). What is the role of MapReduce in Pig programming?

Pig is a high-level platform which makes executing many Hadoop data analysis issues simpler.  A program written in Pig Latin resembles the query written in SQL, where an execution engine is utilized to execute the query. Pig engine is capable of converting the program into MapReduce jobs, where, MapReduce acts as the execution engine.

Q2). How many ways can we run Pig programs?name them

There are three ways in which Pig programs or commands can be executed

  1. Script – Batch Method
  2. Grunt Shell – Interactive Method
  3. Embedded mode

Q3). Explain Grunt in Pig and explain its features?

The Grunt acts as an Interactive Shell Pig. The major features are of Grunt are:

  • The ctrl-e key combination can be used in order to move the cursor to the end of the line.
  • Using up or down cursor keys, the lines in the history buffer can be recalled, as a Grunt remembers command history.
  • The Auto-completion mechanism is supported by Grunt, which when pressed on the Tab key will try to complete Pig Latin keywords and functions

Q4). Explain bag?

A bag is one of the data models present in Pig. The bag is an un-ordered collection of tuples with possible duplicates used to store collections while grouping. The size of bag equals the size of the local disk, this means that the size of the bag is limited. When the bag is full, then Pig will spill this bag into the local disk and keep only some parts of the bag in memory. There is no necessity that the complete bag should fit into memory. We represent bagswith "{}".

Q5). What are the categories of Pig and which one is most common? What are the scalar data types in pig?

Categories of Pig are

  • ETL data pipline
  • Research on raw data
  • Iterative processing

Most common usecase for pig is data pipeline.

Read: What is Spark? Apache Spark Tutorials Guide for Beginner

The scalar data types are:

  • Int: 4bytes
  • Float: 4bytes
  • Double: 8bytes
  • Long: 8bytes
  • the char: array
  • byte: array

Q6). Why should we use ‘group' keyword in pig scripts?

The group statement collects together records with the same key.In SQL the group by clause creates a group that must feed directly into one or more aggregate functions. No direct connection between group and aggregate functions is present in Pig Latin.

Q7). Why should we use ‘orderby' and ‘distinct' keyword in pig scripts?

Sorting of data, producing a total order of the output data is done by the order statement. The syntax of order is similar to group ie by using the key or set of keys. The distinct statement removes The duplicate records is done by the distinct statement. It works only on entire records, not on individual fields.

Q8). Is it possible to join multiple fields in pig scripts?

Yes, it is possible to join multiple fields in pig scripts. Joining the select records from one input and to another input is done by indicating keys for each input. When the keys become equal, the two rows are successfully joined.

Q9). Is it possible to display the limited no of results?

Yes, it is possible to display the limited no of results. The limit allows seeing only a limited number of results when needed.

Related Interview Questions and Answers

  1. Big Data Hadoop Interview Questions 
  2. Storm Interview Questions 
  3. Kafka Interview Questions 
  4. Mapreduce Interview Questions 
  5. Splunk Interview Questions 
  6. Spark Interview Questions

Read: Your Complete Guide to Apache Hive Installation on Ubuntu Linux



    Janbask Training

    A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.


Comments

Trending Courses

AWS

  • AWS & Fundamentals of Linux
  • Amazon Simple Storage Service
  • Elastic Compute Cloud
  • Databases Overview & Amazon Route 53

Upcoming Class

1 day 21 Sep 2020

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing

Upcoming Class

6 days 26 Sep 2020

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning

Upcoming Class

4 days 24 Sep 2020

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation

Upcoming Class

19 days 09 Oct 2020

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL

Upcoming Class

1 day 21 Sep 2020

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing

Upcoming Class

10 days 30 Sep 2020

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum

Upcoming Class

5 days 25 Sep 2020

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design

Upcoming Class

5 days 25 Sep 2020

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation

Upcoming Class

9 days 29 Sep 2020

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks

Upcoming Class

4 days 24 Sep 2020

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning

Upcoming Class

7 days 27 Sep 2020

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop

Upcoming Class

5 days 25 Sep 2020

Search Posts

Reset

Receive Latest Materials and Offers on Hadoop Course

Interviews