Why It's Hard to Find Data And Why You Need a Map: Data Dictionary

Today's enterprises have a vast, complex and ever growing landscape of data sets and the demand for data access and analytics grows every year. I want to show what obstacles data analysts, BI/DWH developers and all other data specialist have navigating this sea of data.

Let's take an imaginary manufacturing company that produces boxes for an example. Let's call it Box Inc.

1. There Are Many Databases

Any enterprise has a number of different loosely connected applications and databases - legacy databases, custom applications, packaged applications, data warehouses and many more.

Box Inc has following data architecture:

  1. ERP packaged application with manufacturing, procurement, inventory, financial modules
  2. Data Warehouse with Planning, Budgeting, Consolidated Reporting and Analytical CRM modules
  3. Packaged CRM application
  4. Packaged Human Resources Management (HRM) application
  5. Packaged Project Management application
  6. Custom Time Tracking application
  7. Custom Technology Management application
  8. Custom Facility Maintenance application
  9. Sales Support application
  10. Packaged Customer Service application integrated with CRM
  11. Packaged Invoice Workflow application integrated with main financial system
  12. Intranet Portal
  13. Access Control database (physical gates and door)
  14. Subcontractor Portal
  15. Suppliers Integration Platform that enables automatic quotes on materials
  16. Standard eCommerce platform where customers can place orders
  17. Customer Portal where customers can access documents and request support
  18. Master Data Management solutions for customers and suppliers to keep data consistent across systems
  19. LDAP with all users accounts
  20. Document Store

And a dozen of other smaller applications and databases.

As you can see, just to navigating through the databases itself is not a trivial task. But let's continue our search of the data.

Sample Enterprise Application Architecture

See sample environment

2. Databases Are Large And Complex

Many of enterprise applications have very large and complex databases. Especially packaged applications, ERP in particular, are a great example of this. It might be hard to believe but popular ERP applications have tens or hundred of thousands of tables and views. Let's have a look at a few examples:

Number of tables and views in popular applications: - TETA (HRM): 9,000 - Oracle e-Business Suite (ERP): 55,000 - SAP (ERP): 130,000!

To visualize you how much is that, this is how 42k tables of particular installation of Oracle e-Business Suite looks like:

Oracle eBS tables

List of sample Oracle eBS (ERP) 42k tables

And those tables are large and complex themselves. Here is a list of columns of order lines table of that Oracle database:

Oracle OE_ORDER_LINES_ALL table columns

Columns of sample Oracle table

I hope it gives you an idea of how difficult it is to find a data or understand what it is you are looking at. It's as if you were looking for it on Manhattan (there are approx. 134,000 buildings in Manhattan).

Manhattan from space Photo by NASA by Expedition 10 Commander Leroy Chiao

Your data is in one of those apartments. And you even have an address - good luck!

Manhattan address

You Need a Map!

I hope I visualized you that having data is not enough to be able to use it. If you want to make any use of your data you need a map. This map around your databases is called Data Dictionary. If you haven't already, you should start building it today.

Aliens metadata discovery Frame from movie "Aliens" (1986), J. Cameron

Start building Data Dictionary today

Comments (0)