All articles · Metadata Management · Database Design & Metadata · Application Metadata · Metadata Tools · Products and News

Test If Your Database Has Foreign Keys Or Is It Full of Loners (Metric + Test Queries)

I recently wrote a few articles on FKs in database (Why there are no FKs, SQL traps) that sparked discussion on a few LinkedIn groups. The feedback was that databases should almost always have foreign keys in place. This contradicted my experience and I did a research on databases I worked with and databases of packaged ERP and CRM applications. It turns out that most databases don't have referential integrity checks in place. I decided to create a metric and a tool to test and measure it. Here is what I came up with.

Loner Tables

Having foreign key constraints in is not a binary attribute - database may have thousands of tables and just a handful of foreign keys or at least one foreign key for each table. Let's start with a simple concept I called Loner table. Loner table is a table that is not linked to any other table with a FK constraint (regardless of the direction). This includes only physical constraints (logical table relationships are not considered).

Loner Ratio

To measure how much or how little foreign key constraints database has I proposed a simple metric - Loner Ratio. It just means a percentage of tables that are loners - not linked to any other table. This metric may have a value from 0 to 1.

And from the relational model point of view it looks like this:

  • 0 means that each and single one table is linked to some other table with foreign key,
  • Close to 0 means that most tables are linked with foreign keys, and just a few of them are loners,
  • 0.5 means that exactly half of the tables are loners, and half are linked to some other table,
  • Close to 1 means that there is just a handful of foreign keys,
  • 1 means that no single table has a foreign key and that all tables are loners.

The Right Value

There is no right value a database should have. Basically, the lower the value, the better. However, having value more than 0 may be perfectly fine. Not every table needs to be linked with some other table. Configuration tables, backup tables, simple fact or aggregate tables (not linked to a dimension table) or interfacing tables might be examples of such tables.

On the other hand, having the value equal to 0 doesn't necessarily mean that all the required foreign keys are defined in the database. It only means each particular table is related to at least one other table.

Test Your Databases

We prepared a set of test queries you can run on your database and test what is its loner ratio and find out if your database has defined foreign keys. You will find them in the section below.

Test Queries

Below you will find a set of SQL queries for major platforms that you can run on your databases to calculate loner ratio and other basic statistics.

Those queries will just extract basic statistics from the metadata and will not write anything back to the database.

Each query returns one row with following columns:

  • table_count tells you how many tables are there in your database,
  • loner_tables tells you how many tables are loners,
  • loner_ratio is the Loner Ratio of your database.

SQL Server

select count(*) [table_count],
       sum(case when fks.cnt + refs.cnt = 0 then 1 else 0 end) 
       as [loner_tables],
       1.0 * sum(case when fks.cnt + refs.cnt = 0 then 1 else 0 end) 
       / count(*) as [loner_ratio]
 from (select schema_name(tab.schema_id) + '.' + tab.name as tab,
              count(fk.name) cnt
         from sys.tables as tab
              left join sys.foreign_keys as fk
              on tab.object_id = fk.parent_object_id
        group by schema_name(tab.schema_id), tab.name) fks
      inner join 
      (select schema_name(tab.schema_id) + '.' + tab.name as tab,
              count(fk.name) cnt
         from sys.tables as tab
              left join sys.foreign_keys as fk
              on tab.object_id = fk.referenced_object_id
        group by schema_name(tab.schema_id), tab.name) refs
      on fks.tab = refs.tab

Oracle

select count(*) as table_count,
       sum (case when nvl(fks.cnt, 0) + nvl(refs.cnt, 0) = 0 then 1 
       else 0 end) as loner_tables,
       sum (case when nvl(fks.cnt, 0) + nvl(refs.cnt, 0) = 0 then 1 
       else 0 end) / count(*) as loner_ratio
  from all_tables tab 
       left outer join 
        (select t.owner,
               t.table_name,
               count(c.constraint_name) cnt
          from all_constraints c
               inner join all_cons_columns t
                   on t.owner = c.owner
                  and t.constraint_name = c.constraint_name
                  and c.constraint_type = 'R'
         group by t.owner, t.table_name) fks
       on tab.owner = fks.owner
         and tab.table_name = fks.table_name 
       left outer join
       (select t.owner,
             t.table_name,
             count(c.constraint_name) cnt
        from all_constraints c
             inner join all_cons_columns t
                 on t.owner = c.r_owner
                and t.constraint_name = c.r_constraint_name
                and c.constraint_type = 'R'
       group by t.owner, t.table_name) refs
       on tab.owner = refs.owner
         and tab.table_name = refs.table_name 
where tab.owner not in ('ANONYMOUS','CTXSYS','DBSNMP','EXFSYS',
 'LBACSYS', 'MDSYS','MGMT_VIEW','OLAPSYS','OWBSYS','ORDPLUGINS',
 'ORDSYS','OUTLN', 'SI_INFORMTN_SCHEMA','SYS','SYSMAN','SYSTEM',
 'TSMSYS','WK_TEST','WKSYS', 'WKPROXY','WMSYS','XDB','APEX_040000',
 'APEX_PUBLIC_USER','DIP', 'FLOWS_30000','FLOWS_FILES','MDDATA',
 'ORACLE_OCM','SPATIAL_CSW_ADMIN_USR', 'SPATIAL_WFS_ADMIN_USR',
 'XS$NULL','PUBLIC')   
and tab.owner like '%'

MySQL

select count(*) table_count,
       sum(case when ifnull(fks.cnt, 0) + ifnull(refs.cnt, 0) = 0 
            then 1 else 0 end) as loner_tables,
       sum(case when ifnull(fks.cnt, 0) + ifnull(refs.cnt, 0) = 0 
            then 1 else 0 end) / count(*) as loner_ratio
  from information_schema.tables tab
       left outer join 
       (select constraint_schema, table_name, count(*) cnt 
          from information_schema.referential_constraints 
         group by constraint_schema, table_name) fks
       on tab.table_name = fks.table_name
       left outer join 
       (select constraint_schema, referenced_table_name 
          as table_name, count(*) cnt 
          from information_schema.referential_constraints 
         group by constraint_schema, referenced_table_name) refs
       on tab.table_name = refs.table_name 
 where tab.table_schema <> 'information_schema'

Share Your Results!

Please take some time to share your results. We will compile them and publish on our blog (make sure you subscribe it).

We would appreciate if you could provide following information for each database.

  • Label - any database label (name for instance) or leave blank,
  • DBMS - what is the DBMS of the database - SQL Server, Oracle, MySQL, etc.,
  • Type (Packaged/Custom) - is it a database of packaged application delivered by a vendor "of the shelf" (like MS Dynamics or SAP) or custom, designed specifically for a particular organization or project,
  • (Transactional/Reporting) - is it a transactional database (e-commerce, order management, CRM, ERP, etc.) or a reporting/analytical database (data warehouse, data mart, etc.),
  • Tables - number of tables in the database,
  • Loner Ratio - paste Loner Ratio that was calculated with our queries,
  • Has FKs - this field is calculated in the spreadsheet, we assumed the following: Less than 0.2 - Yes, 0.2-0.8 - Some, above 0.8 - No.

Download a spreadsheet.

Please send it to us at dataedo@dataedo.com. Thank you!

Recommendations