Data Catalogs are becoming popular, but there seems to be a confusion what they are. Especially, what is their relationship with Data Dictionaries. In this article I'd like to provide basics about those two terms and show the differences and relationships
Data Catalog is a relatively new term - it was coined just a few years ago. Before that industry was using different term - metadata repository (maybe I'll do a comparison in another post).
What is a Data Dictionary?
Data Dictionary is a specification and description of data structures in a database, data model or data source. It consists of list of entities/tables/data sets and their fields/columns/data elements. Data dictionaries can contain various scope of information, depending on the use case. Some of them are data type, description, relationships, aliases, constraints, sources, etc.
Example of Data Dictionary (built with Dataedo):
What is a Data Catalog?
Data Catalog is an inventory of data assets in an organization. It is also a category of software that allows organization to build data catalogs.
Data Catalog (database) includes:
- Inventory of data assets in the database,
- Information about the quality of the data.
Data Catalog software includes:
- Metadata repository with the data inventory assets,
- Metadata scanners that connect to sources and,
- Web interface for data analysts,
- Search functionality.
Screenshot of Data Catalog software (Dataedo Web Catalog):
|Data Dictionary||Data Catalog|
|Definition||Definition of data sets and elements||Inventory of enterprise-wide data assets|
|What it is||Metadata (information)||Software or software with actual database|
|Scope||Data source or data model||Data in organization|
|Metadata||Data sets, fields, relationships, definitions, etc.||Data assets, business glossary, classifications, data lineage|
|Purpose||Describe data in a database||Catalog enterprise data for analytics|
Data Catalogs usually include a Data Dictionary of the data assets. Therefore, Data Dictionary can be thought of as a building block of a Data Catalog. Both are important parts of metadata management strategy.
Dataedo – data dictionary & data catalog solution
The real good news is this: However you call it, Dataedo will improve your data documentation processes at any level and any stage. You can document only a few tables or be responsible for overviewing a huge data lake. With Dataedo, you will save loads of time for you and your teammates!