5 Different Types of Tools You Can Use to Create a Data Dictionary

Table of Contents:


    Now, when accessing company data held in databases is becoming critical, organizations are looking for tools that will allow them to build and share data dictionary of their data sources. A data dictionary is a definition of tables/files and columns/fields in a data set (database, data warehouse or data lake). In this article, I will present you with different types of tools that you can use to build and share such an inventory.

    1. Spreadsheet or word processor

    I think the most obvious tool, perhaps good for a proof of concept, is a spreadsheet software or word processor (preferably the former). Those are generic tools for creating and collaborating on documents that most people are familiar with.

    Creating a data dictionary can be as easy as extracting a list of columns from a database using a query and pasting the results into a spreadsheet for people to fill in the details. But the maintenance is the hard part - making sure it's up to date with the source can be a laborious task.

    Pros

    1. Already familiar tool
    2. Most likely no need for extra licenses
    3. Easy to edit and read

    Cons

    1. Population, and especially maintenance, can be a nightmare

    Data Dictionary in Word document

    Data Dictionary in Excel spreadsheet

    2. DBMS + GUI tool (+ Generator)

    Another very popular approach is to make use of DBMS built-in data dictionary. All database engines (DBMS) have a so-called active data dictionary - an inventory of their data structures. Most DBMSs also have the ability to annotate data dictionary elements (called comments, descriptions or extended properties). Some teams choose to store their metadata in those structures. Comments can be edited with many database management tools that are available for all databases. Then such a data dictionary can be shared with database documentation generator that generates HTML, PDF or another format for easy access. This can be a feature built into a database GUI or a standalone tool.

    So to summarize, the stack looks like this:

    1. Built-in DBMS data dictionary
    2. Comment editor (database GUI tool)
    3. Optional documentation generator

    Pros

    1. Metadata with data, directly in the database
    2. Simple tool stack

    Cons

    1. Heavy limitations in the scope of metadata, only fields and elements already present in the data dictionary
    2. No global data dictionary, documentation is scattered across databases
    3. Impacts database schema, not always desirable (for instance when the vendor doesn't allow for alterations of the database)
    4. Require write access to database schema (no offline work)
    5. Risk of altering of the schema (updating a comment in MySQL alters entire table schema)
    6. Often very clunky comment editors
    7. Stack differs from DBMS to DBMS

    Examples

    1. SQL Server + SSMS
    2. MySQL + MySQL Workbench + data dictionary plugin
    3. Oracle + Oracle SQL Developer + Generate DB Doc option

    Editing table comments in SQL Server Management Studio (SSMS)

    Editing table comments in Oracle SQL Developer

    Data Dictionary export Oracle SQL Developer

    3. Database documentation tools

    Another category of tools you utilize is database documentation tools. Those tools are similar to the above combination but are designed for the purpose of the documentation of the database. They also utilize DBMS data dictionary for metadata storage, provide a (more or less) convenient editor of comments, and allow you to export documentation to a usable format, such as HTML or PDF.

    Pros

    1. More user-friendly comment editors
    2. Better looking output

    Cons (almost the same as above)

    1. Heavy limitations in the scope of metadata, only fields and elements already present in the data dictionary
    2. No global data dictionary, documentation is scattered across databases
    3. Impact database schema, not always desirable (for instance when vendor doesn't allow for alterations of the database)
    4. Require write access to database schema (no offline work)
    5. Risk of altering of the schema (updating a comment in MySQL alters entire table schema)
    6. Most of them for specific DBMS

    Examples

    1. Redgate SQLDoc
    2. ApexSQL Doc

    4. Data Modeling tools

    Another category of tools is data modeling tools. Those are tools created for data modeling and database design in mind, rather than working with an existing database, but all of them support reverse engineering capabilities that allow you to read the schema of an existing database and import it into the model. Once you have a list of tables and columns in the model you can provide a descriptions for tables and columns and finally export data dictionary to an HTML page.

    Pros

    1. Most of them support multiple DBMSs
    2. Metadata in a separate repository
    3. More advanced metadata capabilities (e.g. document table relationships)

    Cons

    1. Mostly focused on visual modeling with diagrams, rather than data dictionary
    2. Clunky editing of data dictionary descriptions (a lot of clicking)
    3. Poor reports
    4. Very poor and often risky import of changes from the database (works well for the first time)
    5. Additional cost

    Examples

    1. Erwin Data Modeler
    2. ER/Studio
    3. MySQL Workbench (MySQL)
    4. Oracle SQL Developer Data Modeler (Oracle)

    Comment editor in erwin Data Modeler

    Export from erwin Data Modeler

    Export from MySQL Workbench

    5. Data Dictionary tools [recommended]

    Another category of tools, that actually is not really crystalized, I called here a Data Dictionary tools. I distinguished it because it is very different from the previous categories in that it has a separate metadata repository (in contrast to GUI and documentation tools) and was designed for documentation of databases (in contrast to data modeling tools).

    Pros

    1. Rich metadata capabilities (show table relationships, multiple descriptive fields, rich text editor)
    2. Multiple databases and engines in one repository
    3. Collaboration capabilities
    4. Convenient searchable documentation export

    Cons

    1. Additional cost

    Examples

    1. Dataedo

    Dataedo data dictionary editor

    Documentation export from Dataedo

    6. Data Catalogs

    The last category is the most advanced tools - collaborative metadata repositories with very advanced search, tagging, lineage, profiling and collaboration capabilities called Data Catalogs. Those tools are centered around data dictionaries that they can build by automatically scanning various sources (including NoSQL or data lakes).

    Pros

    1. Very rich metadata capabilities
    2. Advanced search
    3. Collaboration with commenting, rating, certification of data assets
    4. Automatic data profiling
    5. Automatic and intelligent tagging and classification

    Cons

    1. Expensive
    2. Consumers also require a license

    Examples

    1. Alation
    2. Collibra Catalog
    3. Informatica Enterprise Data Catalog

    Alation

    Informatica Enterprise Data Catalog

    Conclusion

    There are many options and it all depends on what you need to accomplish - is it long term approach or you are just hoping for a quick win? What tools do you already use? Is it just you or a team of people? Are people involved technical? Do you have to write access to the database?

    If you are thinking about it seriously and long term then I advise one of the two last categories - Data Dictionary tools and Data Catalogs. If you are a technical person and don't want to try or buy any new tools use one of the database IDEs, or if you need more custom metadata - a spreadsheet.

    If you don't mind trying something new, want quick results and you're ok with spending a few bucks (but not too many) then go with the data dictionary tool - Dataedo.

    Try Dataedo for free now

    0
    There are no comments. Click here to write the first comment.