I recently wrote a few articles on FKs in database (Why there are no FKs, SQL traps) that sparked discussion on a few LinkedIn groups. The feedback was that databases should almost always have foreign keys in place. This contradicted my experience and I did a research on databases I worked with and databases of packaged ERP and CRM applications. It turns out that most databases don't have referential integrity checks in place. I decided to create a metric and a tool to test and measure it. Here is what I came up with.
Loner Tables
Having foreign key constraints in is not a binary attribute - database may have thousands of tables and just a handful of foreign keys or at least one foreign key for each table. Let's start with a simple concept I called Loner table. Loner table is a table that is not linked to any other table with a FK constraint (regardless of the direction). This includes only physical constraints (logical table relationships are not considered).
Loner Ratio
To measure how much or how little foreign key constraints database has I proposed a simple metric - Loner Ratio. It just means a percentage of tables that are loners - not linked to any other table. This metric may have a value from 0 to 1.
And from the relational model point of view it looks like this:
- 0 means that each and single one table is linked to some other table with foreign key,
- Close to 0 means that most tables are linked with foreign keys, and just a few of them are loners,
- 0.5 means that exactly half of the tables are loners, and half are linked to some other table,
- Close to 1 means that there is just a handful of foreign keys,
- 1 means that no single table has a foreign key and that all tables are loners.
The Right Value
There is no right value a database should have. Basically, the lower the value, the better. However, having value more than 0 may be perfectly fine. Not every table needs to be linked with some other table. Configuration tables, backup tables, simple fact or aggregate tables (not linked to a dimension table) or interfacing tables might be examples of such tables.
On the other hand, having the value equal to 0 doesn't necessarily mean that all the required foreign keys are defined in the database. It only means each particular table is related to at least one other table.
Test Your Databases
We prepared a set of test queries you can run on your database and test what is its loner ratio and find out if your database has defined foreign keys. You will find them in the section below.
Test Queries
Below you will find a set of SQL queries for major platforms that you can run on your databases to calculate loner ratio and other basic statistics.
Those queries will just extract basic statistics from the metadata and will not write anything back to the database.
Each query returns one row with following columns:
- table_count tells you how many tables are there in your database,
- loner_tables tells you how many tables are loners,
- loner_ratio is the Loner Ratio of your database.
SQL Server
select count(*) [table_count],
sum(case when fks.cnt + refs.cnt = 0 then 1 else 0 end)
as [loner_tables],
1.0 * sum(case when fks.cnt + refs.cnt = 0 then 1 else 0 end)
/ count(*) as [loner_ratio]
from (select schema_name(tab.schema_id) + '.' + tab.name as tab,
count(fk.name) cnt
from sys.tables as tab
left join sys.foreign_keys as fk
on tab.object_id = fk.parent_object_id
group by schema_name(tab.schema_id), tab.name) fks
inner join
(select schema_name(tab.schema_id) + '.' + tab.name as tab,
count(fk.name) cnt
from sys.tables as tab
left join sys.foreign_keys as fk
on tab.object_id = fk.referenced_object_id
group by schema_name(tab.schema_id), tab.name) refs
on fks.tab = refs.tab
Oracle
select count(*) as table_count,
sum (case when nvl(fks.cnt, 0) + nvl(refs.cnt, 0) = 0 then 1
else 0 end) as loner_tables,
sum (case when nvl(fks.cnt, 0) + nvl(refs.cnt, 0) = 0 then 1
else 0 end) / count(*) as loner_ratio
from all_tables tab
left outer join
(select t.owner,
t.table_name,
count(c.constraint_name) cnt
from all_constraints c
inner join all_cons_columns t
on t.owner = c.owner
and t.constraint_name = c.constraint_name
and c.constraint_type = 'R'
group by t.owner, t.table_name) fks
on tab.owner = fks.owner
and tab.table_name = fks.table_name
left outer join
(select t.owner,
t.table_name,
count(c.constraint_name) cnt
from all_constraints c
inner join all_cons_columns t
on t.owner = c.r_owner
and t.constraint_name = c.r_constraint_name
and c.constraint_type = 'R'
group by t.owner, t.table_name) refs
on tab.owner = refs.owner
and tab.table_name = refs.table_name
where tab.owner not in ('ANONYMOUS','CTXSYS','DBSNMP','EXFSYS',
'LBACSYS', 'MDSYS','MGMT_VIEW','OLAPSYS','OWBSYS','ORDPLUGINS',
'ORDSYS','OUTLN', 'SI_INFORMTN_SCHEMA','SYS','SYSMAN','SYSTEM',
'TSMSYS','WK_TEST','WKSYS', 'WKPROXY','WMSYS','XDB','APEX_040000',
'APEX_PUBLIC_USER','DIP', 'FLOWS_30000','FLOWS_FILES','MDDATA',
'ORACLE_OCM','SPATIAL_CSW_ADMIN_USR', 'SPATIAL_WFS_ADMIN_USR',
'XS$NULL','PUBLIC')
and tab.owner like '%'
MySQL
select count(*) table_count,
sum(case when ifnull(fks.cnt, 0) + ifnull(refs.cnt, 0) = 0
then 1 else 0 end) as loner_tables,
sum(case when ifnull(fks.cnt, 0) + ifnull(refs.cnt, 0) = 0
then 1 else 0 end) / count(*) as loner_ratio
from information_schema.tables tab
left outer join
(select constraint_schema, table_name, count(*) cnt
from information_schema.referential_constraints
group by constraint_schema, table_name) fks
on tab.table_name = fks.table_name
left outer join
(select constraint_schema, referenced_table_name
as table_name, count(*) cnt
from information_schema.referential_constraints
group by constraint_schema, referenced_table_name) refs
on tab.table_name = refs.table_name
where tab.table_schema <> 'information_schema'
Share Your Results!
Please take some time to share your results. We will compile them and publish on our blog (make sure you subscribe it).
We would appreciate if you could provide following information for each database.
- Label - any database label (name for instance) or leave blank,
- DBMS - what is the DBMS of the database - SQL Server, Oracle, MySQL, etc.,
- Type (Packaged/Custom) - is it a database of packaged application delivered by a vendor "of the shelf" (like MS Dynamics or SAP) or custom, designed specifically for a particular organization or project,
- (Transactional/Reporting) - is it a transactional database (e-commerce, order management, CRM, ERP, etc.) or a reporting/analytical database (data warehouse, data mart, etc.),
- Tables - number of tables in the database,
- Loner Ratio - paste Loner Ratio that was calculated with our queries,
- Has FKs - this field is calculated in the spreadsheet, we assumed the following: Less than 0.2 - Yes, 0.2-0.8 - Some, above 0.8 - No.
Download a spreadsheet.
Please send it to us at dataedo@dataedo.com. Thank you!