Cyber Monday Deal : Flat 30% OFF! + free self-paced courses - SCHEDULE CALL
This blog's readers are mostly IT professionals. They all developed computer programs and compiled data into a database. Data must be retrieved at some point. If you followed the standard normalization procedure of a relational database management system, you wouldn't save all data in a single table. Thus, data must be extracted from multiple tables to generate the register's final output.
As an illustration, take the Sales Register. Any application typically stores sales data in the Sales Header or Sales Detail files. Data from the Customer master and the Item master must also be extracted. Even though each table has a primary key and a foreign key to help you connect them, extracting data from them requires a specific method or tool. This procedure or tool goes by the name Joins.
In the corresponding paragraphs, we will learn about different types of SQL joins- self and merge join- and their purpose and performance.
As the name signifies, in SELF JOIN, a table is joined. Each table row is joined with itself and all other rows depending on some conditions. In other words, it is a joint between two copies of the same table. A self-join is a regular join, but the table is joined with itself.
The Syntax for SQL Server Self Join is as below
SELECT column_name(s) FROM table1 T1, table1 T2
WHERE condition;
T1 and T2 are different table aliases for the same table.
SQL Self Join is helpful when you want to correlate pairs of rows from the same table, for example, in a parent-child relationship. The following query returns the names of all immediate subcategories of the category 'Kitchen.'
SELECT T2.name FROM category T1 JOIN category T2 ON T2.parent = T1.id WHERE T1.name = 'Kitchen'
The result of using a self-join is that it potentially squares the number of output rows. This increase in output data can cause poor performance. Instead of using a self-join, use a window (analytic) function to reduce the number of additional bytes generated by the query.
In this example, we will use the AdventureWorks sample database.
We will consider the Product table under the Production schema for this example.
The sql statement to view the table is below
Select * from Production.Product
The output is as below
The query for self-join looks like this below
SELECT Color=p1.Color, ProductA=p1.Name, ProductB=p2.Name FROM Production.Product p1 INNER JOIN Production.Product p2 ON p1.Color = p2.Color AND p1.Color is not null AND p1.ProductNumber < p2.ProductNumber ORDER BY Color, ProductA, ProductB
The output looks like the one below.
You can use a self-join to simplify nested SQL queries where the inner and outer queries reference the same table. These joins allow you to retrieve related records from the same table. In this table, the manager attribute simply references the employee ID of another employee in the same table. Let's dive into SQL joins and learn more about their importance in SQL and key takeaways. You should check out the SQL career path guide to help you explore all your career options.
The Merge Join transformation provides an output generated by joining two sorted datasets using a FULL, LEFT, or INNER join. For example, you can use a LEFT join to join a table that includes product information with a table that lists the country/region in which a product was manufactured.
The syntax for merge join is as follows
MERGE Table1 t1 USING Table2 t2 ON t1.field1 = t2.field2 WHEN MATCHED THEN UPDATE SET t1.field2 = t1.field2 + t2.field2;
t1 and t2 are different table aliases for the two tables.
Merge join is used when projections of the joined tables are sorted on the join columns. Merge joins are faster and uses less memory than hash joins. Hash join is used when projections of the joined tables are not already sorted on the join columns.
The Merge join is the most efficient in SQL Server. In simple terms, if you see a Merge Join, your query plan is efficient, and you do not need to make many changes to improve query performance. Since the Merge Join operator uses sorted data inputs, it can use two large datasets.
Let us create two tables called BookOrder and Bookinventory and fill them with Data. The sql query is as below.
CREATE TABLE dbo.BookInventory -- target ( TitleID INT NOT NULL PRIMARY KEY, Title NVARCHAR(100) NOT NULL, Quantity INT NOT NULL CONSTRAINT Quantity_Default_1 DEFAULT 0 ); CREATE TABLE dbo.BookOrder -- source ( TitleID INT NOT NULL PRIMARY KEY, Title NVARCHAR(100) NOT NULL, Quantity INT NOT NULL CONSTRAINT Quantity_Default_2 DEFAULT 0 ); INSERT BookInventory VALUES (1, 'The Catcher in the Rye, 6), (2, 'Pride and Prejudice,' 3), (3, 'The Great Gatsby', 0), (5, 'Jane Eyre', 0), (6, 'Catch 22', 0), (8, 'Slaughterhouse Five', 4); INSERT BookOrder VALUES (1, 'The Catcher in the Rye', 3), (3, 'The Great Gatsby', 0), (4, 'Gone with the Wind', 4), (5, 'Jane Eyre', 5), (7, 'Age of Innocence,’ 8);
The final output of the two tables is as below.
Select * from BookInventory.
And
Select * from BookOrder
Now let us merge the two tables. The syntax is as below.
MERGE BookInventory bi USING BookOrder bo ON bi.TitleID = bo.TitleID WHEN MATCHED THEN UPDATE SET bi.Quantity = bi.Quantity + bo.Quantity;
The output looks like below
The merge join is the most efficient of all three operators. It combines the advantage of the hash match, where the actual data needs to be accessed only once, with the advantages of nested loops - low CPU consumption and enabling fast output of matched rows for further query processing.
In the above writeup, we have discussed SQL Server Self Join and Merge Join. We have learned about its advantage, disadvantages, and performance impact. This will be fascinating reading for those who want to learn about merge join and self-join in a more detailed manner. Or else you can also enroll in an online SQL server training course and learn more about SQL server and it’s various functions.
SQL Testing Training
What is Schema in SQL With Example: All You Need to Know
Data Definition Language (DDL) Commands in SQL
How To Create Database Table-All You Need To know
What does a Database Administrator do? A Detailed Study
Cyber Security
QA
Salesforce
Business Analyst
MS SQL Server
Data Science
DevOps
Hadoop
Python
Artificial Intelligence
Machine Learning
Tableau
Download Syllabus
Get Complete Course Syllabus
Enroll For Demo Class
It will take less than a minute
Tutorials
Interviews
You must be logged in to post a comment