MONTH START OFFER: Flat 15% Off with Free Self Learning Course | Use Coupon MONTH15

- QA Testing Blogs -

What is ETL Testing? ETL Testing Tutorial for Beginners

Introduction

Extract-Transform-Loadtesting is what ETL testing stands for, and it is a process for how data is loaded from the source system to the data warehouse. Data is extracted from the OLTP database, transformed into a meaningful schema, and later loaded to the data warehouse. In a few cases, data warehouses may incorporate data from non-OLTP systems too like text files, spreadsheets, or legacy systems, etc.

Take the example of a retail store with different departments like sales, logistics, or marketing, etc. Each department stores customer data independently and the format they are using for data storage is quite different. The sales department usually stores information with the customer name while the marketing department focuses on the customer ID more.

Now if they want to check the customer history and product list owing to different marketing campaigns would be very tedious. Here the solution is using a data warehouse to store the information from different departments together in a uniform structure using ETL. ETL will transform the dissimilar data into a unified pattern. Later, you can use BI to derive meaningful information from processed data. This is what  ETL testing is.

Continue reading to know what is ETL testing in detail and the entire ETL testing process.

What is BI? What is a Data Warehouse?

Before we move ahead, let us discuss two common terms in brief frequently used with the ETL testing process; these are data warehouse and the Business Intelligence (BI). Business Intelligence is the process for collecting raw business data and transforming it into meaningful insights that is more useful for businesses.

The raw data is the record of the daily transactions of an organization like customer interactions, administration of finance, employee management, and so on. This data is further required for reporting, data mining, data analysis, data interpretation, Data quality checks, predictive analysis, and more.

Read: What is STLC? Learn Software Testing Life Cycle Phases

 A data warehouse is designed to query and analyze the data instead of transaction processing. A data warehouse is constructed by integrating data from multiple heterogeneous sources. It helps Companies to consolidate data from multiple sources and perform analysis separately. In the end, data are turned to high-quality information to meet varied enterprise requirements for different levels of users.

What is ETL Testing?

ETL Testing is done to ensure that data is loaded from different sources to the destination after business transformation is accurate. It involves data verification at multiple stages that are being used between the source and the destination. The following diagram gives you a depth idea of an ETL Testing process. What is ETL Testing? Extract:

  • It is used to extract data from multiple sources.

Transform:

  • It is used to transform data into a data warehouse format.
  • Define one or more keys that uniquely identify an entity. The different types of keys in SQL are the primary key, foreign key, alternate key, composite key, or surrogate key, etc. A Data Warehouse owns these keys and never allows other entities to alter them.
  • Once the data is extracted, it will move to the next phase of cleaning and conforming of data. Cleaning omits the unwanted data and identifies and fix the errors as well. Conforming means solving conflicts related to the incompatible data. Additionally, the system creates metadata that is used to diagnose the source system problems and improves the data quality too.

Load:

  • It is used to load data to the data warehouse.
  • It is used to build aggregates that summarize and store data available in fact to improve the overall performance of end-user queries.

Learn QA Software Testing in the Easiest Way

  • Learn from the videos
  • Learn anytime anywhere
  • Pocket-friendly mode of learning
  • Complimentary eBook available

The entire ETL testing process

ETL testing is performed in phases and different phases of ETL testing are given below.

Read: What Is The Average QA Engineer Salary?

ETL testing is performed in five different stages as mentioned:

  • Identify the data sources and requirements.
  • Data acquisition
  • Implementing business logic and dimensional modeling
  • Populating and building data
  • Building reports

ETL Testing Tutorial- ETL Testing types you need to know

  • Production Validation Testing: This type of ETL Testing process is performed on the data being transferred to production systems. Data in production systems should be accurate to support the decision-making process for your business. This testing provides automation and management capabilities to make sure that production systems are not compromised by the data.
  • Validation Testing: This type of testing is performed to make sure that data values are transformed to expected values. It is also named as the source of target testing.
  • Metadata Testing: This type of testing checks data types, data constraints, data length, indexes, etc.
  • Application Upgrades: These types of test cases are generated automatically and saves substantial test development time. Here, data is extracted from an older application to the repository, and it is transferred to a new application in the same order.
  • Data accuracy Testing: This type of testing is performed to make sure that data is loaded and transformed accurately as needed.
  • Data Completeness Testing: This type of testing ensures that data is transferred to the destination in the same format as required. Some of the runs may compare or validate counts and aggregate data between source and destination with simple transformations or no transformations at all.
  • Data Transformation Testing: This type of testing is done on the data to check either it has been transformed into the expected format or not. Here, you should run multiple ETL testing sql queries together for each row and verify the transformation rules.
  • Incremental ETL Testing: This type of testing is performed to check the data integrity when new data is added to the existing data. It makes sure that updates and inserts are done as expected during the incremental ETL process.
  • GUI/Navigation Testing: This type of testing is performed to check the navigation or GUI aspects of the front-end reports.
  • Data Quality Testing: It includes syntax and reference testing. To avoid errors due to date or ordering, data quality testing is performed. For the syntax testing, it highlights the dirty data based on invalid characters, patterns, upper or lower cases, etc. For reference testing, it checks the data based on the data model. For example, data quality testing for Customer ID includes number check, date check, data check, null check, etc.

ETL Test Cases – How to create them?

ETL testing can be applied to different tools and databases in the management industry. The objective of ETL testing is to make sure that data is loaded from the source to the destination if business transformations are accurate.

It involves data verification at multiple middle stages that are being used between the source and the destination. Here are two documents that are always used when preparing ETL test cases. These are ETL Mapping sheets and database (DB) schema.

  • ETL mapping sheets: It contains all the information about the source and destination tables including columns and their reference lookup tables. An ETL tester should have experience in SQL queries because it may include complex ETL testing SQL queries with multiple joins that need to be validated at different stages. This step provides significant help when you want to write queries for data verification.
  • DB Schema: it should be kept handy to verify details in ETL mapping sheets.

ETL Testing process– Test Cases and Test Scenarios

Test Scenario Test Cases
Validation # It validates the structure of the source and target table against corresponding mapping sheets. # It validates either source data type, and destination data type is the same or different. # It verifies the length of the data type for the source and the destination. # It verifies either data fields formats and types are specified or not. # It validates the name of a column against the mapping doc.
Mapping Doc Validation # It verifies the mapping doc either related information is given or not. # It also checks for change-logs maintenance in every mapping doc.
Data Consistency Issues # Even if the semantic definition is the same, data type and length may vary in tables or fields. # It will check for integrity constraints either they are used well or not.
Data Completeness Issues # It makes sure that data is transferred from the source to the destination as expected. # It compares the record count between the source and the destination. # It ensures that data should not be truncated in columns of target tables. # It will check for rejected records. # It will check for boundary value analysis. # It will check for unique key attributes of the loaded data.
Constraint Issues # It validates that constraints are defined for the specific table as expected.
Data Correctness Issues # It checks either data is recorded or spelled well or not. # It checks Null, non-unique out of range data.
Data Transformation Issues # It checks data for transformation either it has converted to the right format or not.
Data quality issues # It will validate the data on different parameters like number check, precision check, date check, data check, or null check, etc.
Null validate # It validates the data for Null and non-null values.
Duplicate Checks # It will check the data for duplicate values and values of columns should be unique as per the business requirement once they are defined as the primary key or unique key.
Data Validation # It is performed to know the row creation date. # It verifies the list of active records on the ETL development perspective. # It verifies active records based on business requirements.
Data cleaning issues # It makes sure that unwanted data is deleted before it is transferred to the destination database.
Complete Data Validation # It is used to validate the complete data in the source and the destination. # It is used to match rows between the source and the destination. # It ensures that count returned by intersection matches with individual counts of source and the destination. # If count is plus, it means duplicate rows exist.

QA Software Testing Training

  • No cost for a Demo Class
  • Industry Expert as your Trainer
  • Available as per your schedule
  • Customer Support Available

ETL Testing guide – Type of Bugs

Type of ETL Bugs Description
User Interface Bugs These bugs are related to the GUI of an application, font styles, colors, size, alignment, navigation, spelling check, etc.
Boundary Value Analysis (BVA) Bugs These bugs check on the minimum and maximum values.
Equivalence Class Partitioning (ECP) Bugs It results in valid and invalid types.
I/O bugs In this case, it starts accepting invalid values, and valid values are rejected.
Calculation bugs It shows mathematical errors and the final output is wrong most of the time.
Load Condition Bugs It does not allow multiple users. It does not allow the loading of user expected data.
Race Condition Bugs The system will not run perfectly; it starts crashing or hanging.
Version Control Bugs It usually occurs in the Regression testing and does not give any information on versions.
H/W Bugs Here, the device will not respond to the application as expected.
Help Source bugs It results in mistakes in help documents.

How to compare ETL Testing and Database Testing?

ETL Testing:

  • It verifies either data has been moved as expected.
  • It verifies that counts in source and the target are the same
  • It verifies that foreign key parameters are reserved during the ETL process.
  • It verifies the data for duplicate values.

Database Testing:

Read: Usability Testing Interview Questions & Answers for Fresher, Experienced
  • It checks either data is following standard rules defined in the data model
  • It verifies that there are no orphan records and foreign-primary key relations are maintained well.
  • It verifies that there are no redundant tables and the database is optimally normalized.
  • It verifies if data are missing in columns as required.

What are the responsibilities of an ETL Tester?

The key responsibilities of an ETL Tester are divided into three major categories:

  • Stage Tables
  • Business Logic Transformation
  • Target table loading from the stage table, once you apply a transformation.

Some more responsibilities of an ETL Tester are given below.

  • He tests ETL software thoroughly.
  • He checks test components of the ETL data warehouse.
  • He executes data-driven tests in the backend.
  • He creates, designs, and executes test cases, test harness, or test plans, etc.
  • He identifies the problem and suggests the best solution too.
  • He approves design specifications and requirements.
  • He transfers data from flat files.
  • He writes ETL testing SQL queries for different test scenarios.

ETL Performance Testing

ETL performance testing is performed to make sure that the ETL system can handle loads of multiple data and transactions. The goal of ETL performance testing is optimizing session performance and eliminating bottlenecks. For tuning the performance of sessions, you should identify performance bottlenecks and eliminate them. To identify performance bottlenecks, you should check the system, mapping docs, source database, target database, and the session, etc. One of the best tools for ETL performance testing is Informatica.

Generally, ETL testing is performed by using SQL scripting which is quite a time consuming and boring too. It is error-prone and seldom provides complete test coverage. To improve coverage, reduce defects, and optimize costs, an ETL testing process should be performed in development and production environments. Automation is the need of the hour today, and it can be achieved through ETL testing tools like Informatica.

QA Software Testing Training

  • Personalized Free Consultation
  • Access to Our Learning Management System
  • Access to Our Course Curriculum
  • Be a Part of Our Free Demo Class

ETL Testing Tutorial - Best Practices for ETL Testing

  • A tester has to make sure that data is transformed correctly.
  • Data should be loaded into the warehouse without any data loss or data truncation.
  • He ensures that the ETL application appropriately rejects the invalid data and accepts the valid data.
  • He ensures that data is loaded to the warehouse in expected timeframes to confirm the scalability and the performance.
  • He makes sure that all methods have appropriate unit tests regardless of their visibility.
  • He measures the effectiveness of unit tests with the help of proper test coverage techniques.
  • He strives for one assertion per test case.
  • He generates unit tests that target exceptions.

Final Words:

ETL testing process is significant for legalizing the production data and know if it's correct, dependable, and trustworthy. With this blog cum ETL testing guide, you can get a complete idea of what is ETL testing  and its significance for an organization. To know more about the ETL testing process, and what ETL testing is,  you may join the QA certification program at JanBask Training.

Read: Sanity Testing Tutorial Guide for Beginner



    Janbask Training

    A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.


Comments

Trending Courses

AWS

  • AWS & Fundamentals of Linux
  • Amazon Simple Storage Service
  • Elastic Compute Cloud
  • Databases Overview & Amazon Route 53

Upcoming Class

-1 day 04 Dec 2020

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing

Upcoming Class

7 days 12 Dec 2020

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning

Upcoming Class

4 days 09 Dec 2020

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation

Upcoming Class

6 days 11 Dec 2020

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL

Upcoming Class

-1 day 04 Dec 2020

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing

Upcoming Class

0 day 05 Dec 2020

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum

Upcoming Class

-1 day 04 Dec 2020

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design

Upcoming Class

-1 day 04 Dec 2020

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation

Upcoming Class

-1 day 04 Dec 2020

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks

Upcoming Class

0 day 05 Dec 2020

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning

Upcoming Class

15 days 20 Dec 2020

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop

Upcoming Class

14 days 19 Dec 2020

Search Posts

Reset

Receive Latest Materials and Offers on QA Testing Course

Interviews