Data Catalogs are becoming popular, but there seems to be a confusion what they are. Especially, what is their relationship with Data Dictionaries. In this article I'd like to provide basics about those two terms and show the differences and relationships
Data Catalog is a relatively new term - it was coined just a few years ago. Before that industry was using different term - metadata repository (maybe I'll do a comparison in another post).
What is a Data Dictionary?
Data Dictionary is a specification and description of data structures in a database, data model or data source. It consists of list of entities/tables/data sets and their fields/columns/data elements. Data dictionaries can contain various scope of information, depending on the use case. Some of them are data type, description, relationships, aliases, constraints, sources, etc.
Example of Data Dictionary (built with Dataedo):
What is a Data Catalog?
Data Catalog is an inventory of data assets in an organization. It is also a category of software that allows organization to build data catalogs.
Data Catalog (database) includes:
- Inventory of data assets in the database,
- Information about the quality of the data.
Data Catalog (software) includes:
- Metadata repository with the data inventory assets,
- Metadata scanners that connect to sources and,
- Web interface for data analysts,
- Search functionality.
Screenshot of Data Catalog software (Dataedo Web Catalog):
|Data Dictionary||Data Catalog|
|Definition||Definition of data sets and elements||Inventory of enterprise-wide data assets|
|What it is||Metadata (information)||Software or software with actual database|
|Scope||Data source or data model||Data in organization|
|Metadata||Data sets, fields, relationships, definitions, etc.||Data assets, business glossary, classifications, data lineage|
|Purpose||Describe data in a database||Catalog enterprise data for analytics|
Data Catalogs usually include a Data Dictionary of the data assets. Therefore, Data Dictionary can be thought of as a building block of a Data Catalog.