SQL Server SELF JOIN

Learning Objective

The objective of this SQL Server tutorial is to teach you how to use SELF JOIN to join a table to itself.

What is SELF JOIN in SQL Server?

A SELF JOIN is a join which joins a table to itself. The purpose of such a JOIN is to compare the records in a table which is otherwise not possible. Basically, it is like creating a copy of a table so that the table and its copy can interact like two different tables and a record in the table can be compared to its duplicate and all other records in the table copy. A SELF JOIN is not a special type of JOIN but rather an INNER JOIN or a LEFT OUTER JOIN (also called LEFT JOIN) implemented on the same table. It functions in exactly the same manner with the only difference being that the logic and operation is applied to the same table and its replica.

A SELF JOIN can be created on a common column or on different columns in the same table. Since a SELF JOIN uses the same table therefore it is necessary to use at least one alias name for the table so that SQL understands and treats them as two different tables. A SELF JOIN is used to process hierarchical data, identify duplicate data and sequence data.

SQL Server SELF JOIN Syntax

The basic syntax of SELF JOIN can be any one of the following depending upon whether we use an INNER JOIN or a LEFT JOIN for the purpose.

SELECT column_list
FROM table1 alias1
INNER JOIN
table1 alias2
ON alias1.columnX=alias2.columnX;
SELECT column_list
FROM table1 alias1
LEFT JOIN
table1 alias2
ON alias1.columnX=alias2.columnX;

In this syntax,

  • column_list – the list of columns specified from the aliased tables in the SELECT statement.
  • table1 – the table on which the statement operates.
  • alias1/alias2 – alias names for the table to distinguish them as separate entities.
  • INNER JOIN/JOIN – SQL keyword combination to implement an INNER JOIN (where the resultset is the intersecting rows of the participating tables).
  • columnX – common column on which the JOIIN is made.

SQL Server SELF JOIN with Examples

Let us understand the practical application of SELF JOIN with the help of some examples. We will use a sample table called emp represented below. The table contains the name of employees mapped to the id of their manager and the employee’s salary.

IdNameMgrIdSalary
1Kirsten Rose1152000
2Julia WellsNULL75000
13Korbin Miles1151000
4Robert Fox1150000
5Steven Pitt1255000
8Jane Frost1255000
9Simon Ray260000
10Sandra Bull266000
11Albert SpencerNULL76000
12Raymond CageNULL71000
Table: emp

1) SQL Server SELF JOIN – hierarchical data processing with INNER JOIN

In the above table there is a hierarchical relationship between employee and manager. The MgrId column of the table is a subset of the Id column and contains data values which are common to the Id column data values. We can use a SELF JOIN to identify and report this hierarchical relationship. The following query does so by creating a SELF JOIN on the table using INNER JOIN on two different table columns (i.e. MgrId and Id columns as mentioned above). The table is aliased first as e (i.e. the left table) and then as m (i.e. the right table) and the output is sorted by the column alias ‘Manager Name’ to group the output for easy readability.

SELECT
e.name 'Employee Name',
m.name 'Manager Name'
FROM emp e
INNER join emp m
ON e.mgrid = m.id
ORDER BY 'Manager Name';

It will produce the following output where every employee name is listed with his/her manager name. However, if we examine closely, we will see that the 3 employees (i.e. those who are managers and self-managed are missing from the list). That is because an INNER JOIN only reports intersections i.e. where the JOIN condition is met or where the joined column values match in both tables).

Employee NameManager Name
Kirsten RoseAlbert Spencer
Korbin MilesAlbert Spencer
Robert FoxAlbert Spencer
Simon RayJulia Wells
Sandra BullJulia Wells
Steven PittRaymond Cage
Jane FrostRaymond Cage

2) SQL Server SELF JOIN – hierarchical data processing with LEFT JOIN

The above shortcoming can be overcome and an exhaustive list generated by using a LEFT JOIN which will list all records from the left column including those which do not match the join condition in the right table. The following query does do using a LEFT JOIN and the same JOIN condition and alias names and sorting as above.

SELECT
e.name 'Employee Name',
m.name 'Manager Name'
FROM emp e
LEFT join emp m
ON e.mgrid = m.id
ORDER BY 'Manager Name';

It will produce the following output where every employee name is listed with his/her manager name including those who are self-managed and not reporting to anybody. The Manager Name value for them is NULL as is the normal behavior of LEFT JOIN.

Employee NameManager Name
Julia WellsNULL
Albert SpencerNULL
Raymond CageNULL
Korbin MilesAlbert Spencer
Robert FoxAlbert Spencer
Kirsten RoseAlbert Spencer
Simon RayJulia Wells
Sandra BullJulia Wells
Steven PittRaymond Cage
Jane FrostRaymond Cage

3) SQL Server SELF JOIN – identifying and reporting duplicate data

As noted above while describing SELF JOIN it is also possible to identify and report duplicate data (i.e. multiple records having the same column value for one or more column) using SELF JOIN. The following query does the same and identifies and lists those employees who have the same salary. The query creates a SELF JOIN on the table using INNER JOIN on the same table column (i.e. Id column) but on a not equal to condition. The table is aliased first as e (i.e. the left table) and then as ee (i.e. the right table). For records where the JOIN condition is satisfied (i.e. where the Id of the left table e is different from the id of the right table ee) it checks the WHERE condition and if satisfied includes the record in the resultset.

SELECT
e.name Employee Name,
e.salary Salary
FROM emp e
INNER join emp ee
ON ee.id != e.id
WHERE ee.salary=e.salary;

It will produce the following output which lists two employees who have the same salary.

Employee NameSalary
Steven Pitt55000
Jane Frost55000

Advertisement