Table of Contents:
Data dictionary is an inventory of data elements in a database or data model with detailed description of its format, relationships, meaning, source and usage.
Active and passive data dictionary
You can classify data dictionary into two main categories:
- Active data dictionary - is part of and managed by DBMS.
- Passive data dictionary - is not part of and managed by the DBMS.
Active data dictionary
Every change in database structure (using DDL - Data Definition Language) is automatically reflected in active data dictionary.
Passive data dictionary
Passive data dictionary is separate from the database and all changes in database structure need to be applied in the passive data dictionary manually, or with dedicated software.
Passive data dictionary can take different forms:
- A document or spreadsheet
- Data Catalogs
- Data integration/ETL metadata repositories
- Data modeling tools
Elements of Data Dictionary
Data dictionary is a table with data elements (columns) as rows and their attributes as columns. Specific attributes vary depending on the purpose of the data dictionary.
Data dictionary has 2 essential elements:
- List of tables (or entities)
- List of columns (or fields, or attributes)
Minimum data dictionary:
DBMS data dictionary
Relational database engines enable much more description of data models and provide this information through their data dictionaries. This information is:
- Data type of column,
- Default values for columns,
- Nullability of columns,
- Table relationships (foreign keys),
- Uniqueness of column values (primary and unique keys),
- Data elements descriptions
Extended data dictionary
There are more attributes for each table or column that architects, teams or organizations might want to collect that are not supported by database engines. It all depends on the purpose of the data dictionary. Those attributes could be:
- Meaning and purpose,
- Holds sensitive data,
- List of values,
- String formats,
- UI form label,
to name a few.
In such case teams can collect this information in external document or dedicated software (referred to as data dictionary tool, metadata repository, data catalog).
Functions of Data Dictionary
Data dictionary can be used for different purposes. Those key purposes are:
1. Data Dictionary in database systems (DBMS)
All major relational database management systems store information about data structures in a special structures – predefined tables or views that hold metadata about each element of a database – tables, columns, indexes, foreign keys, constraints and so on.
This type of data dictionary serves the purpose of providing information to users and tools about the database schema - all elements of data model and programs.
- Data dictionary in relational databases
- Browse catalog of queries to data dictionary of relational databases
2. Data modeling
Data Dictionary can be used as a tool to model data. This can be done with dedicated data modeling tool or plain spreadsheet or document. In this case data dictionary serves as a specification of entities and their fields and helps business analysts, subject matter experts and architects to gather requirements and model the domain. Physical database and application is then designed and implemented based on this document.
Data dictionary can also be used as a reference and cataloging of existing data assets - tables in databases, spreadsheets, files and so on.
This can be achieved with a few formats and tools:
- Read only HTML or PDF exports from DBMS with database tools
- Manually prepared and maintained spreadsheets,
- Data modeling tools with the use of reverse engineering,
- Database documentation tools,
- Metadata repositories/data catalogs
Forms of Data Dictionary
Data Dictionaries exist in a few different forms:
1. DBMS data dictionary
Most common occurrence of data dictionary is the one built into most database systems, often referred to as data dictionary, system catalog or system tables.
Data Dictionary can be in a form a text or HTML document or spreadsheet, detached form any physical data source. We can further split this form into:
- Manual document - text document or a spreadsheet, created and maintained manually, used for modeling or documentation,
- Static exported document - usually HTML or PDF document generated automatically from database schema with a tool or data models.
3. Metadata repository
Organizations that are mature in their data governance implement special software that extracts, manages and provides access to data dictionary of data assets in multiple databases. This software can be called Metadata Repository (broader term) or Data Catalog (more specific and trending concept).
Data Dictionary in Database Lifecycle
Diagram below shows where data dictionary appears in the process of design, management and documentation of databases.