Don't Most Databases Have Foreign Key Constraints? [Research data + Poll]

This article is a part of a longer discussion about the lack of foreign key constraints in databases. Article about reasons why many databases don't have FK constraints sparkled a deep and passionate discussion on LinkedIn groups about how those keys are important to make databases work. I mostly agree with such statement, but my experience told me that practice is far from theory. So I started a research and found out that most packaged ERP/CRM applications don't have foreign keys. Now, I collected data from 40 databases I have worked with for the past 10+ years (that I still have access to or a recollection of). Here are results of my research. I hope this is just an introduction to more vast research that you will contribute to as well.

Data

This is the data from all the databases I still have access to or its schema, or I remember their design. Those are databases designed by many entities - small teams and large global organizations.

The last column shows whether the database has foreign key constraints.

Analysis

Let's have a look at the results - it turns out that 75%! of databases I worked with don't have foreign key constraints. For something that most consider a must have this is surprising.

Application Type

Let's first break this down by application type:

Custom and packaged applications, ERP/CRM

In that category I put all the databases of transactional applications such as CRM, ERP, HRMS, billing, practice management that help companies in their daily operations, both web and desktop. Bespoke and off-the-shelf. I believe most DBAs and architects would say this is the category that needs FK constraints the most - they have many users and data collected there is critical for any organization. And yet a majority of them seem to handle referential integrity lightly. What is most surprising is that packaged off-the-shelf applications even more so.

Data warehouses

Data warehouses are databases designed specifically for reporting and analytical purposes of vast amounts of data and loaded from external sources. Most people agreed that data warehouses are among exceptions as their data consistency is managed in the source databases. The results of 0% of FK constraints is what I expected.

Tools

In this category fall all kind of tools - forum and blog engines, modeling and reporting tool repositories. They are either open source or proprietary and don't bother much about consistency.

Other

I put into reporting applications category databases that are somewhere between transactional applications and data warehouses - they are used for reporting but populated by users. Interfaces are databases that are used to feed another database. Those two categories were a bit surprising for myself as I would bet those databases would be the worst. Eventually, they turned out to be all implementing referential integrity checks.

Custom vs. Packaged

Split between custom/bespoke databases that are designed for particular client and packaged/off-the-shelf databases looks like this:

What is surprising, databases designed for a multi-use purpose, tend to care less about such design details than custom ones. It's odd since designers of such applications should have more time and put more stress to design database schema properly. In the latter case, where teams need to act more quickly, deadlines are short, and budgets always seem too small (I know because this is what I was doing over 15+ years of my career) results were better.

There are some clues why packaged databases don't use foreign keys as much.

Transactional vs. Reporting

Split by transactional databases and those whose purpose is reporting is a little unexpected. Reporting databases in my data set have more FKs but it might be caused by its small size (and those two reporting applications).

DBMS

When we look at particular DBMS, you can see big differences. Oracle databases seem to be much better than SQL Server ones, and MySQL appears not to support FKs at all. This might be caused by specific data set. About a half of those example SQL Server databases are in one environment, and this just might be the cause of specific design "style". If I exclude this environment, then the results are very close to the Oracle.

I have a very small data set regarding MySQL. This category is mostly open source tools. It might be the case that open source teams don't care about FKs.

Not surprisingly, cross-platform databases (N/A) don't have foreign keys, and this might be explained by the fact that designers wanted to be as little dependent on the database platform as possible.

Share Your Data!

As you noticed, my research was based on a small data set - 40 databases - and that it was not too representative. I am very interested in your experience and the databases you work with. Please take some time to compile this information and share the results with me. Send me your data at dataedo@dataedo.com or:

Take a poll

Once I collect enough data, I will compile it into similar report and share on our blog.

Thank you!

Comments (0)