2 Common SQL Join Traps (with Test Queries)

This article shows 2 very common traps you can fall into while joining tables in reports and queries. It is quite easy to fall them when there are no primary and foreign keys defined in the database and the consequences of errors are serious. Let's have a look.

Trap 1: Incorrect Idea of Primary Key

This case occurs when you assume that a certain column is a primary key but it is not in fact unique.

Result

As a result of a join with such column you get values for element that exists in the primary table more than once doubled (or tripled, etc.). This creates a serious error to your results and it is not always obvious to spot.

Solution

Test uniqueness of such column.

Check primary and unique keys
Test uniqueness (see query below)
Check documentation (if exists)

Query below tests uniqueness of projects_no column in projects table. If query returns any row column isn't unique.

select project_no, count(*)
from projects
group by project_no
having count(*) > 1

Please note that even if values are unique at the time of execution of this query this does not mean that it will always be so. This might change with new data.

When you understand the real relationship between the tables just add extra condition to your join statement.

238 likes

Dataedo Cartoon

Trap 2: Missing Parent Values

This case occurs when you assume that for each value in your child table (foreign key) there is a corresponding value in parent table (primary key). If database doesn't take care of referential integrity (as it often happens) your join will exclude some rows.

Result

Your report will be missing all the elements that do not appear in the parent (lookup) table. Just as the case above, this has a serious impact on the results.

Solution

Test referential integrity (see query below)
Check documentation (if exists)

Query below tests referential integrity between tasks->projects by project_no column. If query returns any row data is not consistent and you shouldn't rely on a projects table.

select t.project_no
from tasks t left outer join projects p
on t.project_no = p.project_no
where p.project_no is null

To remedy this situation simply use left outer join statement and isnull(p.project_no, t.project_no) (SQL Server) or nvl(p.project_no, t.project_no) (Oracle) for a label column.

223 likes

Dataedo Cartoon

Avoiding All This

To avoid situations like this where you don't understand the relationship between the tables...

...fix database schema and create referential integrity constraints in the database or document data model in an accessible repository.

There is a good tool for that - Dataedo. It enables you to read schema of your existing databases and provide meaningful metadata to avoid such traps. You can describe table relationships in a separate repository (when you cannot modify your existing databases).

This documentation should be created and updated each time you learn something about the data or data model.

You can then export it to an interactive documentation for future reference and all other people that need to query the database.

Take a Dataedo product tour

By Use Case

By Industry

Learn About Dataedo

Learn About Data

After Hours