Upto 20% Scholarship on Live Online Classes

- Technology Blogs -

ETL Testing Tutorial Guide for Beginners

ETL Testing Tutorial Guide

ETL is abbreviated for Extract-Transform-Load, and it is a process for how data is loaded from the source system to the data warehouse. Data is extracted from the OLTP database, transformed into a meaningful schema, and later loaded to the data warehouse. In a few cases, data warehouses may incorporate data from non-OLTP systems too like text files, spreadsheets, or legacy systems, etc.

Take the example of a retail store with different departments like sales, logistics, or marketing, etc. Each department stores customer data independently and the format they are using for data storage is quite different. The sales department usually stores information with the customer name while marketing department focuses on the customer ID more.

Now if they want to check the customer history and product list owing to different marketing campaigns would be very tedious. Here the solution is using a data warehouse to store the information from different departments together in a uniform structure using ETL. ETL will transform the dissimilar data into a unified pattern. Later, you can use BI to derive meaningful information from processed data.

What is BI? What is a Data Warehouse?

Before we move ahead, let us discuss two common terms in brief frequently used with the ETL process; these are data warehouse and the Business Intelligence (BI). Business Intelligence is the process for collecting raw business data and transforming it into meaningful insights that is more useful for businesses.

The raw data is the record of daily transactions of an organization like customer interactions, administration of finance, employee management, and so on. This data is further required for reporting, data mining, data analysis, data interpretation, Data quality checks, predictive analysis, and more.

What is BI? What is a Data Warehouse?

A data warehouse is designed to query and analyze the data instead of transaction processing. A data warehouse is constructed by integrating data from multiple heterogeneous sources. It helps Companies to consolidate data from multiple sources and perform analysis separately. In the end, data is turned to high-quality information to meet varied enterprise requirements for different levels of users.

What is ETL Testing?

ETL Testing is done to ensure that data is loaded from different sources to the destination after business transformation is accurate. It involves data verification at multiple stages that are being used between the source and the destination. The following diagram gives you a depth idea of an ETL process.

What is ETL Testing?

Extract:

  • It is used to extract data from multiple sources.

Transform:

  • It is used to transform data into a data warehouse format.
  • Define one or more keys that uniquely identify an entity. The different types of keys in SQL are the primary key, foreign key, alternate key, composite key, or surrogate key, etc. A Data Warehouse owns these keys and never allows other entities to alter them.
  • Once the data is extracted, it will move to the next phase of cleaning and conforming of data. Cleaning omits the unwanted data and identifies and fix the errors as well. Conforming means solving conflicts related to the incompatible data. Additionally, the system creates metadata that is used to diagnose the source system problems and improves the data quality too.

Load:

  • It is used to load data to the data warehouse.
  • It is used to build aggregates that summarize and store data available in fact table to improve the overall performance of end-user queries.

Different Phases of ETL Testing

ETL testing is performed in phase and different phase of ETL testing are given below.

Read:   Real-time and Job-Focused Manual Testing Exclusively from JanBask Training

ETL testing is performed in five different stages as mentioned:

  • Identify the data sources and requirements.
  • Data acquisition
  • Implementing business logic and dimensional modeling
  • Populating and building data
  • Building reports

ETL Testing types – What you should know?

  1. Production Validation Testing: This type of ETL Testing is performed on the data being transferred to production systems. Data in production systems should be accurate to support the decision-making process for your business. This testing provides automation and management capabilities to make sure that production systems are not compromised by the data.
  2. Validation Testing: This type of testing is performed to make sure that data values are transformed to expected values. It is also named as the source to target testing.
  3. Metadata Testing: This type of testing checks data types, data constraints, data length, indexes, etc.
  4. Application Upgrades: These types of test cases are generated automatically and saves substantial test development time. Here, data is extracted from an older application to the repository, and it is transferred to a new application in the same order.
  5. Data accuracy Testing: This type of testing is performed to make sure that data is loaded and transformed accurately as needed.
  6. Data Completeness Testing: This type of testing ensures that data is transferred to the destination in the same format as required. Some of the runs may compare or validate counts and aggregates data between source and destination with simple transformations or no transformations at all.
  7. Data Transformation Testing: This type of testing is done on the data to check either it has been transformed to expected format or not. Here, you should run multiple queries together for each row and verify the transformation rules.
  8. Incremental ETL Testing: This type of testing is performed to check the data integrity when new data is added to the existing data. It makes sure that updates and inserts are done as expected during the incremental ETL process.
  9. GUI/Navigation Testing: This type of testing is performed to check the navigation or GUI aspects of the front-end reports.
  10. Data Quality Testing: It includes syntax and reference testing. To avoid errors due to date or ordering, data quality testing is performed. For the syntax testing, it highlights the dirty data based on invalid characters, patterns, upper or lower cases, etc. For reference testing, it checks the data based on the data model. For example, data quality testing for Customer ID includes number check, date check, data check, null check, etc.

ETL Test Cases – How to create them?

ETL testing can be applied to different tools and databases in the management industry. The objective of ETL testing is to make sure that data is loaded from the source to the destination if business transformations are accurate.

It involves data verification at multiple middle stages that are being used between the source and the destination. Here are two documents that are always used when preparing ETL test cases. These are ETL Mapping sheets and database (DB) schema.

  1. ETL mapping sheets: It contains all the information of the source and destination tables including columns and their reference lookup tables. An ETL tester should have experience in SQL queries because it may include complex queries with multiple joins that need to be validated at different stages. This step provides significant help when you want to write queries for data verification.
  2. DB Schema: it should be kept handy to verify details in ETL mapping sheets.
Read:   Top 15 Penetration Testing Tools To Know In 2019

ETL Testing – Test Cases and Test Scenarios

Test Scenario Test Cases
Validation # It validates the structure of the source and target table against corresponding mapping sheets.

# It validates either source data type, and destination data type is the same or different.

# It verifies the length of data type for source and the destination.

# It verifies either data fields formats and types are specified or not.

# It validates the name of a column against mapping doc.

Mapping Doc Validation # It verifies the mapping doc either related information is given or not.

# It also checks for change logs maintenance in every mapping doc.

Data Consistency Issues # Even if the semantic definition is the same, data type and length may vary in tables or fields.

# It will check for integrity constraints either they are used well or not.

Data Completeness Issues # It makes sure that data is transferred from the source to the destination as expected.

# It compares the records count between the source and the destination.

# It ensures that data should not be truncated in columns of target tables.

# It will check for rejected records.

# It will check for boundary value analysis.

# It will check for unique key attributes for the loaded data.

Constraint Issues # It validates that constraints are defined for the specific table as expected.
Data Correctness Issues # It checks either data is recorded or spelled well or not.

# It checks Null, non-unique out of range data.

Data Transformation Issues # It checks data for transformation either it has converted to the right format or not.
Data quality issues # It will validate the data on different parameters like number check, precision check, date check, data check, or null check, etc.
Null validate # It validates the data for Null and non-null values.
Duplicate Checks # It will check the data for duplicate values and values of columns should be unique as per the business requirement once they are defined as the primary key or unique key.
Data Validation # It is performed to know the row creation date.

# It verifies the list of active records on ETL development perspective.

# It verifies active records based on business requirements.

Data cleaning issues # It makes sure that unwanted data is deleted before it is transferred to the destination database.
Complete Data Validation # It is used to validate the complete data in the source and the destination.

# It is used to match rows between the source and the destination.

# It ensures that count returned by intersection matches with individual counts of source and the destination.

# If count is plus, it means duplicate rows exist.

ETL Testing – Type of Bugs

Type of ETL Bugs Description
User Interface Bugs These bugs are related to the GUI of an application, font styles, colors, size, alignment, navigation, spelling check, etc.
Boundary Value Analysis (BVA) Bugs These bugs check on the minimum and maximum values.
Equivalence Class Partitioning (ECP) Bugs It results in valid and invalid types.
I/O bugs In this case, it starts accepting invalid values, and valid values are rejected.
Calculation bugs It shows mathematical errors and the final output is wrong most of the times.
Load Condition Bugs It does not allow multiple users. It does not allow the loading of user expected data.
Race Condition Bugs The system will not run perfectly; it starts crashing or hanging.
Version Control Bugs It usually occurs in the Regression testing and does not give any information on versions.
H/W Bugs Here, the device will not respond to the application as expected.
Help Source bugs It results in mistakes in help documents.

How to compare ETL Testing and Database Testing?

ETL Testing:

  • It verifies either data has been moved as expected.
  • It verifies that counts in source and the target are the same
  • It verifies that foreign key parameters are reserved during the ETL process.
  • It verifies the data for duplicate values.

Database Testing:

  • It checks either data is following standard rules defined in the data model
  • It verifies that there are no orphan records and foreign-primary key relations are maintained well.
  • It verifies that there are no redundant tables and database is normalized optimally.
  • It verifies if data is missing in columns as required.

What are the responsibilities of an ETL Tester?

The key responsibilities of an ETL Tester are divided into three major categories:

  1. Stage Tables
  2. Business Logic Transformation
  3. Target table loading from the stage table, once you apply a transformation.

Some more responsibilities of an ETL Tester are given below.

  • He tests ETL software thoroughly.
  • He checks test components of ETL data warehouse.
  • He executes data-driven tests in the backend.
  • He creates, designs, and executes test cases, test harness, or test plans, etc.
  • He identifies the problem and suggests the best solution too.
  • He approves design specifications and requirements.
  • He transfers data from flat files.
  • He writes SQL queries for different test scenarios.

ETL Performance Testing

ETL performance testing is performed to make sure that the ETL system can handle loads of multiple data and transactions. The goal of ETL performance testing is optimizing session performance and eliminating bottlenecks. For tuning the performance of sessions, you should identify performance bottlenecks and eliminate it. To identify performance bottlenecks, you should check the system, mapping docs, source database, target database, and the session, etc. one of the best tools for ETL performance testing is Informatica.

Generally, ETL testing is performed by using SQL scripting which is quite a time consuming and boring too. It is error-prone and seldom provides complete test coverage. To improve coverage, reduce defects, and optimize costs, ETL testing should be performed in development and production environments. Automation is need of the hour today, and it can be achieved through ETL testing tools like Informatica.

Best Practices for ETL Testing

  • A tester has to make sure that data is transformed correctly.
  • Data should be loaded to the warehouse without any data loss or data truncation.
  • He ensures that the ETL application appropriately rejects the invalid data and accepts the valid data.
  • He ensures that data is loaded to the warehouse in expected timeframes to confirm the scalability and the performance.
  • He makes sure that all methods have appropriate unit tests regardless of their visibility.
  • He measures the effectiveness of unit tests with the help of proper test coverage techniques.
  • He strives for one assertion per test case.
  • He generates unit tests that target exceptions.

Final Words:

ETL testing is significant for legalizing the production data and know if it’s correct, dependable, and trustworthy. With this blog, you can get a complete idea of ETL testing and its significance for an organization. To know more about ETL testing, you may join the QA certification program at JanBask Training.

Read:   How To Become A Quality Assurance Tester?
JanBask Training

JanBask Training

JanBask Training is a leading Global Online Training Provider through Live Sessions. The Live classes provide a blended approach of hands on experience along with theoretical knowledge which is driven by certified professionals.



Write a Comment

avatar
  Subscribe  
Notify of

Trending

Top 30 Core Java Interview Questions and Answers for Fresher, Experienced Developer
What Is Trigger In Salesforce?
Top 30 Manual Testing Interview Questions & Answers for Fresher
Different Types of SQL Server & SQL Database Functions
Top 20 AWS Interview Question and Answers For Fresher, Experienced Developer

Related Posts

What is Unit Testing? Unit Testing Tutorial Guide for Beginners
Automation Testing Interview Questions & Answers
Difference Between Software Validation and Verification
Performance Testing Interview Questions & Answers
What is JIRA? JIRA Tutorial Guide for Beginner