Upto 20% Scholarship on Live Online Classes
ETL is abbreviated for Extract-Transform-Load, and it is a process for how data is loaded from the source system to the data warehouse. Data is extracted from the OLTP database, transformed into a meaningful schema, and later loaded to the data warehouse. In a few cases, data warehouses may incorporate data from non-OLTP systems too like text files, spreadsheets, or legacy systems, etc.
Take the example of a retail store with different departments like sales, logistics, or marketing, etc. Each department stores customer data independently and the format they are using for data storage is quite different. The sales department usually stores information with the customer name while marketing department focuses on the customer ID more.
Now if they want to check the customer history and product list owing to different marketing campaigns would be very tedious. Here the solution is using a data warehouse to store the information from different departments together in a uniform structure using ETL. ETL will transform the dissimilar data into a unified pattern. Later, you can use BI to derive meaningful information from processed data.
What is BI? What is a Data Warehouse?
Before we move ahead, let us discuss two common terms in brief frequently used with the ETL process; these are data warehouse and the Business Intelligence (BI). Business Intelligence is the process for collecting raw business data and transforming it into meaningful insights that is more useful for businesses.
The raw data is the record of daily transactions of an organization like customer interactions, administration of finance, employee management, and so on. This data is further required for reporting, data mining, data analysis, data interpretation, Data quality checks, predictive analysis, and more.
A data warehouse is designed to query and analyze the data instead of transaction processing. A data warehouse is constructed by integrating data from multiple heterogeneous sources. It helps Companies to consolidate data from multiple sources and perform analysis separately. In the end, data is turned to high-quality information to meet varied enterprise requirements for different levels of users.
ETL Testing is done to ensure that data is loaded from different sources to the destination after business transformation is accurate. It involves data verification at multiple stages that are being used between the source and the destination. The following diagram gives you a depth idea of an ETL process.
ETL testing is performed in phase and different phase of ETL testing are given below.
ETL testing is performed in five different stages as mentioned:
ETL testing can be applied to different tools and databases in the management industry. The objective of ETL testing is to make sure that data is loaded from the source to the destination if business transformations are accurate.
It involves data verification at multiple middle stages that are being used between the source and the destination. Here are two documents that are always used when preparing ETL test cases. These are ETL Mapping sheets and database (DB) schema.
|Test Scenario||Test Cases|
|Validation||# It validates the structure of the source and target table against corresponding mapping sheets.
# It validates either source data type, and destination data type is the same or different.
# It verifies the length of data type for source and the destination.
# It verifies either data fields formats and types are specified or not.
# It validates the name of a column against mapping doc.
|Mapping Doc Validation||# It verifies the mapping doc either related information is given or not.
# It also checks for change logs maintenance in every mapping doc.
|Data Consistency Issues||# Even if the semantic definition is the same, data type and length may vary in tables or fields.
# It will check for integrity constraints either they are used well or not.
|Data Completeness Issues||# It makes sure that data is transferred from the source to the destination as expected.
# It compares the records count between the source and the destination.
# It ensures that data should not be truncated in columns of target tables.
# It will check for rejected records.
# It will check for boundary value analysis.
# It will check for unique key attributes for the loaded data.
|Constraint Issues||# It validates that constraints are defined for the specific table as expected.|
|Data Correctness Issues||# It checks either data is recorded or spelled well or not.
# It checks Null, non-unique out of range data.
|Data Transformation Issues||# It checks data for transformation either it has converted to the right format or not.|
|Data quality issues||# It will validate the data on different parameters like number check, precision check, date check, data check, or null check, etc.|
|Null validate||# It validates the data for Null and non-null values.|
|Duplicate Checks||# It will check the data for duplicate values and values of columns should be unique as per the business requirement once they are defined as the primary key or unique key.|
|Data Validation||# It is performed to know the row creation date.
# It verifies the list of active records on ETL development perspective.
# It verifies active records based on business requirements.
|Data cleaning issues||# It makes sure that unwanted data is deleted before it is transferred to the destination database.|
|Complete Data Validation||# It is used to validate the complete data in the source and the destination.
# It is used to match rows between the source and the destination.
# It ensures that count returned by intersection matches with individual counts of source and the destination.
# If count is plus, it means duplicate rows exist.
|Type of ETL Bugs||Description|
|User Interface Bugs||These bugs are related to the GUI of an application, font styles, colors, size, alignment, navigation, spelling check, etc.|
|Boundary Value Analysis (BVA) Bugs||These bugs check on the minimum and maximum values.|
|Equivalence Class Partitioning (ECP) Bugs||It results in valid and invalid types.|
|I/O bugs||In this case, it starts accepting invalid values, and valid values are rejected.|
|Calculation bugs||It shows mathematical errors and the final output is wrong most of the times.|
|Load Condition Bugs||It does not allow multiple users. It does not allow the loading of user expected data.|
|Race Condition Bugs||The system will not run perfectly; it starts crashing or hanging.|
|Version Control Bugs||It usually occurs in the Regression testing and does not give any information on versions.|
|H/W Bugs||Here, the device will not respond to the application as expected.|
|Help Source bugs||It results in mistakes in help documents.|
The key responsibilities of an ETL Tester are divided into three major categories:
Some more responsibilities of an ETL Tester are given below.
ETL performance testing is performed to make sure that the ETL system can handle loads of multiple data and transactions. The goal of ETL performance testing is optimizing session performance and eliminating bottlenecks. For tuning the performance of sessions, you should identify performance bottlenecks and eliminate it. To identify performance bottlenecks, you should check the system, mapping docs, source database, target database, and the session, etc. one of the best tools for ETL performance testing is Informatica.
Generally, ETL testing is performed by using SQL scripting which is quite a time consuming and boring too. It is error-prone and seldom provides complete test coverage. To improve coverage, reduce defects, and optimize costs, ETL testing should be performed in development and production environments. Automation is need of the hour today, and it can be achieved through ETL testing tools like Informatica.
ETL testing is significant for legalizing the production data and know if it’s correct, dependable, and trustworthy. With this blog, you can get a complete idea of ETL testing and its significance for an organization. To know more about ETL testing, you may join the QA certification program at JanBask Training.
JanBask Training is a leading Global Online Training Provider through Live Sessions. The Live classes provide a blended approach of hands on experience along with theoretical knowledge which is driven by certified professionals.