In SQL databases, relations between tables are established through keys, which help maintain data integrity and ensure that the data is logically connected.
Joins are fundamental to SQL queries, allowing you to combine data from two or more tables based on related columns. This guide explores different types of joins, providing explanations and examples to help you master their usage.
Why Do We Need Joins?
Joins are essential because they allow you to:
- Combine Data: fetch data from multiple tables based on related columns.
- Avoid Data Duplication: maintain data normalization by using multiple tables, reducing redundancy.
Joins are performed on columns that have a logical relationship. These columns typically have the same data type and meaning.
In SQL you can perform the following joins:
- Inner Join
- Left Join
- Right Join
- Full Join
- Cross Join
- Self Join
- Union
Most of the joins in SQL have the common syntax:
sqlSELECT table1.column1, table2.column2 FROM table1 INNER JOIN table2 ON table1.common_column = table2.common_column;
In place of INNER JOIN
you can use any other join.
All SQL queries from this blog post were tested in the Postgres database. While the similar SQL syntax can be used in other databases.
Inner Join
The INNER JOIN returns only the rows that match values in both tables.
For better understanding, have a look at the following image with Table A and Table B:
Here, the rows that are returned by inner join have a colored background. As you can see, the rows that match values in both tables are in the middle of Table A and Table B.
We will use this circle's example for all other joins.
In this post I'll showcase all types of joins for employees
and departments
tables.
Where the employee
has a foreign key to departments
table:
sqlCREATE TABLE departments ( id INT PRIMARY KEY, name VARCHAR(100) NOT NULL, location VARCHAR(100) ); CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(100) NOT NULL, department_id INT FOREIGN KEY REFERENCES departments(id), job_title VARCHAR(100) );
Let's insert some data into these tables:
You can select employees
and their departments
by using INNER JOIN SQL statement:
sqlSELECT employees.name, departments.name FROM employees INNER JOIN departments ON employees.department_id = departments.id;
INNER JOIN returns only those employees
that have matching departments
.
When making joins, you can use table aliases for brevity:
sqlSELECT employees.name, departments.name FROM employees e INNER JOIN departments d ON e.department_id = d.id;
Left Join
The LEFT JOIN (also called LEFT OUTER JOIN) returns all rows from the left table and the matched rows from the right table. If no match is found, NULL values are returned for columns from the right table:
sqlSELECT employees.name, departments.name FROM employees LEFT JOIN departments ON employees.department_id = departments.id;
LEFT JOIN returns all rows from the employees
table.
For those rows that have matching departments
, the department_id
will have a corresponding value from the departments
table.
If there is no matching department, the result will show NULL
for the department_name
.
If you need to remove the employees
rows that don't have a matching department
, you can add the WHERE
clause to filter the NULL rows:
sqlSELECT employees.name, departments.name FROM employees LEFT JOIN departments ON employees.department_id = departments.id WHERE departments.id IS NOT NULL;
You can see a visualization of this case on the picture:
Right Join
The RIGHT JOIN (also called RIGHT OUTER JOIN) returns all rows from the right table and the matched rows from the left table. If no match is found, NULL values are returned for columns from the left table:
sqlSELECT employees.name, departments.name FROM employees RIGHT JOIN departments ON employees.department_id = departments.id;
RIGHT JOIN returns all rows from the departments
table.
For those rows that have matching employees
, the department_id
in the employees
table will have a corresponding value.
If there is no matching employee, the result will show NULL
for the corresponding column.
If you need to remove the department
rows that don't have a matching employee
, you can add the WHERE
clause to filter the NULL rows:
sqlSELECT employees.name, departments.name FROM employees RIGHT JOIN departments ON employees.department_id = departments.id WHERE employees.id IS NOT NULL;
You can see a visualization of this case on the picture:
Full Join
The FULL JOIN (also called FULL OUTER JOIN) returns all rows when there is a match in either left or right table. Rows without a match in one of the tables will have NULL values for columns of that table:
sqlSELECT employees.name, departments.name FROM employees FULL JOIN departments ON employees.department_id = departments.id;
FULL JOIN returns all rows when there is a match in either the employees
or the departments
table.
For those rows that have matching departments, the department_id
will have a corresponding value from the departments
table.
If there is no matching department, the result will show NULL
for the department.name
.
Similarly, if there is no matching employee
, the result will show NULL for the employees.name
column.
If you need to get a list of all employees
and departments
but exclude rows where there are NULLs on both sides, you can add the WHERE
clause to filter the NULL rows:
sqlSELECT employees.name, departments.name FROM employees FULL JOIN departments ON employees.department_id = departments.id WHERE employees.id IS NOT NULL AND departments.id IS NOT NULL;
You can see a visualization of this case on the picture:
Cross Join
The CROSS JOIN returns the cartesian product of the two tables, combining each row of the first table with all rows of the second table.
CROSS JOIN has a different syntax, comparing to other joins:
sqlSELECT table1.column1, table2.column2 FROM table1 CROSS JOIN table2;
Let's have a look at example:
sqlSELECT employees.name, departments.name FROM employees CROSS JOIN departments;
CROSS JOIN returns the cartesian product of the employees
and departments
tables.
This means that each row from the employees
table is combined with each row from the departments
table, resulting in all possible combinations of rows between the two tables.
Self Join
A SELF JOIN is a regular join, but the table is joined with itself.
SELF JOIN has a different syntax, comparing to other joins:
sqlSELECT a.column1, b.column2 FROM table a, table b WHERE condition;
To showcase this type of join, let's add the manager_id
column to the employees
:
sqlALTER TABLE employees ADD COLUMN manager_id INT;
This column has a reference to employees
table, without a foreign key.
Now we can make a SELF JOIN:
sqlSELECT e1.name AS Employee, e2.name AS Manager FROM employees e1, employees e2 WHERE e1.manager_id = e2.id;
In this example, we are joining the employees
table to itself.
This can be used to find pairs of employees
where one is the manager
of the other.
Each row in the employees
table is compared with every other row to find matching rows.
Union
The UNION operator stays aside from all joins.
UNION is used to combine the results of multiple tables or SELECT
statements into a single result set.
First, let's explore the syntax of UNION operator:
sqlSELECT column1, column2, ... FROM table1 UNION SELECT column1, column2, ... FROM table2;
Now, let's have a look at example:
sqlSELECT 'Department' AS Type, name AS Name FROM Departments UNION SELECT 'Employee' AS Type, name as Name FROM employees;
The first SELECT
statement retrieves data from the departments
table, the other - from the employees
table.
The UNION operator combines these results into a single result set, where the Type
column indicates whether the row represents a department or an employee.
By default, UNION eliminates duplicate rows from the result set. If you want to include duplicates, you can use UNION ALL.
sqlSELECT 'Department' AS Type, name AS Name FROM Departments UNION ALL SELECT 'Employee' AS Type, name as Name FROM employees;
There are few limitations when using a UNION statement:
- Column Count and Data Types: each
SELECT
statement within theUNION
must have the same number of columns and the corresponding columns must have compatible data types. - Order of Columns: the order of columns must be the same in all
SELECT
statements.
You can sort the final result set using the ORDER BY
clause after the last SELECT
statement:
sqlSELECT 'Department' AS Type, DepartmentName AS Name, Location AS Details FROM Departments UNION ALL SELECT 'Employee', EmployeeName, JobTitle FROM Employees;
When To Use Each Type of Join
Inner Join
Inner Join: Use this join when you need to retrieve only the rows that have matching values in both tables, which is ideal when you need intersection of the datasets.
Left Join
Left Join: Use this join when you need all rows from the left table and the matching rows from the right table, including cases where there might not be a match, which is useful for keeping all data from the left table.
Right Join
Right Join: Use this join when you need all rows from the right table and the matching rows from the left table, including cases where there might not be a match, which is useful for keeping all data from the right table.
Full Join
Full Join: Use this join when you need all rows when there is a match in either left or right table, which is helpful for a complete view that includes all records from both tables regardless of matching.
Cross Join
Cross Join: Use this join when you need to create a Cartesian product of the tables, which is typically used for generating combinations of rows or for testing purposes.
Self Join
Self Join: Use this join when you need to compare rows within the same table, which is useful for hierarchical data or for finding relationships among rows in a single dataset.
Union
Union: Use this operation when you need to combine the results of two or more SELECT queries into a single result set, eliminating duplicates, which is ideal for merging similar datasets from different sources.
Hope you find this blog post useful. Happy coding!