Table of Contents:
I recently wrote a few articles on FKs in database (Why there are no FKs, SQL traps) that sparked discussion on a few LinkedIn groups. The feedback was that databases should almost always have foreign keys in place. This contradicted my experience and I did a research on databases I worked with and databases of packaged ERP and CRM applications. It turns out that most databases don't have referential integrity checks in place. I decided to create a metric and a tool to test and measure it. Here is what I came up with.
Having foreign key constraints in is not a binary attribute - database may have thousands of tables and just a handful of foreign keys or at least one foreign key for each table. Let's start with a simple concept I called Loner table. Loner table is a table that is not linked to any other table with a FK constraint (regardless of the direction). This includes only physical constraints (logical table relationships are not considered).
To measure how much or how little foreign key constraints database has I proposed a simple metric - Loner Ratio. It just means a percentage of tables that are loners - not linked to any other table. This metric may have a value from 0 to 1.
And from the relational model point of view it looks like this:
- 0 means that each and single one table is linked to some other table with foreign key,
- Close to 0 means that most tables are linked with foreign keys, and just a few of them are loners,
- 0.5 means that exactly half of the tables are loners, and half are linked to some other table,
- Close to 1 means that there is just a handful of foreign keys,
- 1 means that no single table has a foreign key and that all tables are loners.
The Right Value
There is no right value a database should have. Basically, the lower the value, the better. However, having value more than 0 may be perfectly fine. Not every table needs to be linked with some other table. Configuration tables, backup tables, simple fact or aggregate tables (not linked to a dimension table) or interfacing tables might be examples of such tables.
On the other hand, having the value equal to 0 doesn't necessarily mean that all the required foreign keys are defined in the database. It only means each particular table is related to at least one other table.
Test Your Databases
We prepared a set of test queries you can run on your database and test what is its loner ratio and find out if your database has defined foreign keys. You will find them in the section below.
Below you will find a set of SQL queries for major platforms that you can run on your databases to calculate loner ratio and other basic statistics.
Those queries will just extract basic statistics from the metadata and will not write anything back to the database.
Each query returns one row with following columns:
- table_count tells you how many tables are there in your database,
- loner_tables tells you how many tables are loners,
- loner_ratio is the Loner Ratio of your database.
select count(*) [table_count], sum(case when fks.cnt + refs.cnt = 0 then 1 else 0 end) as [loner_tables], 1.0 * sum(case when fks.cnt + refs.cnt = 0 then 1 else 0 end) / count(*) as [loner_ratio] from (select schema_name(tab.schema_id) + '.' + tab.name as tab, count(fk.name) cnt from sys.tables as tab left join sys.foreign_keys as fk on tab.object_id = fk.parent_object_id group by schema_name(tab.schema_id), tab.name) fks inner join (select schema_name(tab.schema_id) + '.' + tab.name as tab, count(fk.name) cnt from sys.tables as tab left join sys.foreign_keys as fk on tab.object_id = fk.referenced_object_id group by schema_name(tab.schema_id), tab.name) refs on fks.tab = refs.tab
select count(*) as table_count, sum (case when nvl(fks.cnt, 0) + nvl(refs.cnt, 0) = 0 then 1 else 0 end) as loner_tables, sum (case when nvl(fks.cnt, 0) + nvl(refs.cnt, 0) = 0 then 1 else 0 end) / count(*) as loner_ratio from all_tables tab left outer join (select t.owner, t.table_name, count(c.constraint_name) cnt from all_constraints c inner join all_cons_columns t on t.owner = c.owner and t.constraint_name = c.constraint_name and c.constraint_type = 'R' group by t.owner, t.table_name) fks on tab.owner = fks.owner and tab.table_name = fks.table_name left outer join (select t.owner, t.table_name, count(c.constraint_name) cnt from all_constraints c inner join all_cons_columns t on t.owner = c.r_owner and t.constraint_name = c.r_constraint_name and c.constraint_type = 'R' group by t.owner, t.table_name) refs on tab.owner = refs.owner and tab.table_name = refs.table_name where tab.owner not in ('ANONYMOUS','CTXSYS','DBSNMP','EXFSYS', 'LBACSYS', 'MDSYS','MGMT_VIEW','OLAPSYS','OWBSYS','ORDPLUGINS', 'ORDSYS','OUTLN', 'SI_INFORMTN_SCHEMA','SYS','SYSMAN','SYSTEM', 'TSMSYS','WK_TEST','WKSYS', 'WKPROXY','WMSYS','XDB','APEX_040000', 'APEX_PUBLIC_USER','DIP', 'FLOWS_30000','FLOWS_FILES','MDDATA', 'ORACLE_OCM','SPATIAL_CSW_ADMIN_USR', 'SPATIAL_WFS_ADMIN_USR', 'XS$NULL','PUBLIC') and tab.owner like '%'
select count(*) table_count, sum(case when ifnull(fks.cnt, 0) + ifnull(refs.cnt, 0) = 0 then 1 else 0 end) as loner_tables, sum(case when ifnull(fks.cnt, 0) + ifnull(refs.cnt, 0) = 0 then 1 else 0 end) / count(*) as loner_ratio from information_schema.tables tab left outer join (select constraint_schema, table_name, count(*) cnt from information_schema.referential_constraints group by constraint_schema, table_name) fks on tab.table_name = fks.table_name left outer join (select constraint_schema, referenced_table_name as table_name, count(*) cnt from information_schema.referential_constraints group by constraint_schema, referenced_table_name) refs on tab.table_name = refs.table_name where tab.table_schema <> 'information_schema'
Share Your Results!
Please take some time to share your results. We will compile them and publish on our blog (make sure you subscribe it).
We would appreciate if you could provide following information for each database.
- Label - any database label (name for instance) or leave blank,
- DBMS - what is the DBMS of the database - SQL Server, Oracle, MySQL, etc.,
- Type (Packaged/Custom) - is it a database of packaged application delivered by a vendor "of the shelf" (like MS Dynamics or SAP) or custom, designed specifically for a particular organization or project,
- (Transactional/Reporting) - is it a transactional database (e-commerce, order management, CRM, ERP, etc.) or a reporting/analytical database (data warehouse, data mart, etc.),
- Tables - number of tables in the database,
- Loner Ratio - paste Loner Ratio that was calculated with our queries,
- Has FKs - this field is calculated in the spreadsheet, we assumed the following: Less than 0.2 - Yes, 0.2-0.8 - Some, above 0.8 - No.
Download a spreadsheet.
Please send it to us at firstname.lastname@example.org. Thank you!