|
A join combines records from two or more tables in a relational database. In the Structured Query Language (SQL), there are several types of joins, including "inner", "left", "right", and "outer". They all work by linking records on a key field.
The simplest type of join doesn't even use the JOIN keyword, but instead simply a comma with no ON clause. Other forms include this clause, which is how the relationship between the two joined tables is determined.
For example, given employee and department tables as follows:
| Table "employee"
|
| LastName
| DepartmentID
|
| Smith | 34
|
| Jones | 33
|
| Robinson | 34
|
| Jasper | 36
|
| Steinberg | 33
|
| Rafferty | 31
|
|
| Table "department"
|
| DepartmentName
| DepartmentID
|
| Sales | 31
|
| Engineering | 33
|
| Clerical | 34
|
| Marketing | 35
|
|
A statement such as the following could be used to select data from both tables using a join:
SELECT LastName, DepartmentID, DepartmentName
FROM employee
JOIN department ON employee.DepartmentID = department.DepartmentID
Inner and outer join
Inner join
An inner join essentially takes all the records from table A (in this case, employee) and for each of them joins them with each record from table B (department). This means that the length of the result would be, in theory, the product of the number of records in each table—in this case, 6 * 4.
With indexing and an ON clause, this number can be reduced significantly without the need to scan all of those records.
In this case, an inner join (such as the query above) on both tables' DepartmentID columns will return:
| LastName
| DepartmentID
| DepartmentName
|
| Smith | 34 | Clerical
|
| Jones | 33 | Engineering
|
| Robinson | 34 | Clerical
|
| Steinberg | 33 | Engineering
|
| Rafferty | 31 | Sales
|
Left outer join
A left outer join is very different from a inner join. Instead of limiting results to those in both tables, it limits results to those in the "left" table (A). This means that if the ON clause matches 0 records, a row in the result will still be returned—but with NULL values for each column.
For example, this allows us to find the employee's departments, but still show the employee even when their department has not been set yet. The above example would have ignored employees in non-existent departments.
The result might look like this:
| LastName
| DepartmentID
| DepartmentName
|
| Smith | 34 | Clerical
|
| Jones | 33 | Engineering
|
| Robinson | 34 | Clerical
|
| Jasper | 36 | NULL
|
| Steinberg | 33 | Engineering
|
| Rafferty | 31 | Sales
|
Right outer join
A right outer join is much like a left outer join, except that the tables are reversed. Every record from the right side, B or department, will be returned, and NULL values will be returned for those that have no matching left record. The results of a right outer join on these tables would look like:
| LastName
| DepartmentID
| DepartmentName
|
| Smith | 34 | Clerical
|
| Jones | 33 | Engineering
|
| Robinson | 34 | Clerical
|
| Steinberg | 33 | Engineering
|
| Rafferty | 31 | Sales
|
| NULL | 35 | Marketing
|
Full outer join
Full outer joins are the combination of left and right outer joins. These joins will show records from both tables, and fill in NULLs for missing matches on either side.
Some database systems do not offer this functionality, but it can be emulated through the use of left outer joins and unions.
| LastName
| DepartmentID
| DepartmentName
|
| Smith | 34 | Clerical
|
| Jones | 33 | Engineering
|
| Robinson | 34 | Clerical
|
| Jasper | 36 | NULL
|
| Steinberg | 33 | Engineering
|
| Rafferty | 31 | Sales
|
| NULL | 35 | Marketing
|
Implementation
The efficient implementations of joins has been goal of much work in database systems. This is because joins are both commutative and associative. In practice, this means that the user merely supplies the list of relations to be joined and the join conditions to be used, and the database system has the task of determining the most efficient way to perform the operation. Determining how to execute a query containing joins is done by the query optimizer. It has two basic freedoms:
- join order: because joins are commutative, the order in which relations are joined does not change the final result set of the query. However, join order does have an enormous impact on the cost of the join operation, so choosing the right join order is very important.
- join method: given two relations and a join condition, there are multiple algorithms to produce the result set of the join. Which algorithm is most efficient depends on the sizes of the input relations, the number of tuples from each relation that match the join condition, and the operations required by the rest of the query.
Query optimization
Most query optimizers determine join order via a dynamic programming algorithm derived from the System R database system, originally designed by IBM.
Join algorithms
There are three fundamental algorithms to perform a join operation.
Nested loops
Merge join
If both tables are sorted by the common join attribute, the join can be execute efficiently.
Hash join
Semi join
A semi join is an efficient join method where first the join attributed of one table are collected and reported to the second one. In was reported in 1981 first. It can be improved with a Bloom-Filter (hashing).
See also
|