Now, when accessing company data held in databases is becoming critical, organizations are looking for tools that will allow them to build and share data dictionary of their data sources. A data dictionary is a definition of tables/files and columns/fields in a data set (database, data warehouse or data lake). In this article, I will present you with different types of tools that you can use to build and share such an inventory.
1. Spreadsheet or word processor
I think the most obvious tool, perhaps good for a proof of concept, is a spreadsheet software or word processor (preferably the former). Those are generic tools for creating and collaborating on documents that most people are familiar with.
Creating a data dictionary can be as easy as extracting a list of columns from a database using a query and pasting the results into a spreadsheet for people to fill in the details. But the maintenance is the hard part - making sure it's up to date with the source can be a laborious task.
Pros
- Already familiar tool
- Most likely no need for extra licenses
- Easy to edit and read
Cons
- Population, and especially maintenance, can be a nightmare
Data Dictionary in Word document
Data Dictionary in Excel spreadsheet
2. DBMS + GUI tool (+ Generator)
Another very popular approach is to make use of DBMS built-in data dictionary. All database engines (DBMS) have a so-called active data dictionary - an inventory of their data structures. Most DBMSs also have the ability to annotate data dictionary elements (called comments, descriptions or extended properties). Some teams choose to store their metadata in those structures. Comments can be edited with many database management tools that are available for all databases. Then such a data dictionary can be shared with database documentation generator that generates HTML, PDF or another format for easy access. This can be a feature built into a database GUI or a standalone tool.
So to summarize, the stack looks like this:
- Built-in DBMS data dictionary
- Comment editor (database GUI tool)
- Optional documentation generator
Pros
- Metadata with data, directly in the database
- Simple tool stack
Cons
- Heavy limitations in the scope of metadata, only fields and elements already present in the data dictionary
- No global data dictionary, documentation is scattered across databases
- Impacts database schema, not always desirable (for instance when the vendor doesn't allow for alterations of the database)
- Require write access to database schema (no offline work)
- Risk of altering of the schema (updating a comment in MySQL alters entire table schema)
- Often very clunky comment editors
- Stack differs from DBMS to DBMS
Examples
- SQL Server + SSMS
- MySQL + MySQL Workbench + data dictionary plugin
- Oracle + Oracle SQL Developer + Generate DB Doc option
Editing table comments in SQL Server Management Studio (SSMS)
Editing table comments in Oracle SQL Developer
Data Dictionary export Oracle SQL Developer
3. Database documentation tools
Another category of tools you utilize is database documentation tools. Those tools are similar to the above combination but are designed for the purpose of the documentation of the database. They also utilize DBMS data dictionary for metadata storage, provide a (more or less) convenient editor of comments, and allow you to export documentation to a usable format, such as HTML or PDF.
Pros
- More user-friendly comment editors
- Better looking output
Cons (almost the same as above)
- Heavy limitations in the scope of metadata, only fields and elements already present in the data dictionary
- No global data dictionary, documentation is scattered across databases
- Impact database schema, not always desirable (for instance when vendor doesn't allow for alterations of the database)
- Require write access to database schema (no offline work)
- Risk of altering of the schema (updating a comment in MySQL alters entire table schema)
- Most of them for specific DBMS
Examples
- Redgate SQLDoc
- ApexSQL Doc
4. Data Modeling tools
Another category of tools is data modeling tools. Those are tools created for data modeling and database design in mind, rather than working with an existing database, but all of them support reverse engineering capabilities that allow you to read the schema of an existing database and import it into the model. Once you have a list of tables and columns in the model you can provide a descriptions for tables and columns and finally export data dictionary to an HTML page.
Pros
- Most of them support multiple DBMSs
- Metadata in a separate repository
- More advanced metadata capabilities (e.g. document table relationships)
Cons
- Mostly focused on visual modeling with diagrams, rather than data dictionary
- Clunky editing of data dictionary descriptions (a lot of clicking)
- Poor reports
- Very poor and often risky import of changes from the database (works well for the first time)
- Additional cost
Examples
- Erwin Data Modeler
- ER/Studio
- MySQL Workbench (MySQL)
- Oracle SQL Developer Data Modeler (Oracle)
Comment editor in erwin Data Modeler
Export from erwin Data Modeler
Export from MySQL Workbench
5. Data Dictionary tools [recommended]
Another category of tools, that actually is not really crystalized, I called here a Data Dictionary tools. I distinguished it because it is very different from the previous categories in that it has a separate metadata repository (in contrast to GUI and documentation tools) and was designed for documentation of databases (in contrast to data modeling tools).
Pros
- Rich metadata capabilities (show table relationships, multiple descriptive fields, rich text editor)
- Multiple databases and engines in one repository
- Collaboration capabilities
- Convenient searchable documentation export
Cons
- Additional cost
Examples
- Dataedo
Dataedo data dictionary editor
Documentation export from Dataedo
6. Data Catalogs
The last category is the most advanced tools - collaborative metadata repositories with very advanced search, tagging, lineage, profiling and collaboration capabilities called Data Catalogs. Those tools are centered around data dictionaries that they can build by automatically scanning various sources (including NoSQL or data lakes).
Pros
- Very rich metadata capabilities
- Advanced search
- Collaboration with commenting, rating, certification of data assets
- Automatic data profiling
- Automatic and intelligent tagging and classification
Cons
- Expensive
- Consumers also require a license
Examples
- Alation
- Collibra Catalog
- Informatica Enterprise Data Catalog
Alation
Informatica Enterprise Data Catalog
Conclusion
There are many options and it all depends on what you need to accomplish - is it long term approach or you are just hoping for a quick win? What tools do you already use? Is it just you or a team of people? Are people involved technical? Do you have to write access to the database?
If you are thinking about it seriously and long term then I advise one of the two last categories - Data Dictionary tools and Data Catalogs. If you are a technical person and don't want to try or buy any new tools use one of the database IDEs, or if you need more custom metadata - a spreadsheet.
If you don't mind trying something new, want quick results and you're ok with spending a few bucks (but not too many) then go with the data dictionary tool - Dataedo.