Confused about what data you have and how to use it? Does data reporting take years because your analysts cannot access your data? Read our tutorial and learn how to create a data dictionary in your organization. It will be good for you!
Have you ever wondered if there is a better way to look through and handle your databases’ components and content? You are in the right place! As you will learn from this article, Dataedo’s data dictionary can save tons of time around these tasks. Are you interested?
Data dictionaries allow you to produce and maintain meaningful documentation of your database. Clear and up-to-date documentation enables your analysts and other data users to access and use data without overburdening your database managers. With Dataedo’s data dictionary specifically, you can do this beyond the capabilities of your DBMS. Clearly, this is a big win for everyone!
In this tutorial, we provide you with a comprehensive account of what a data dictionary is, what you can use it for, and how to use Dataedo to document your organization’s data assets.
After reading the tutorial, you will have a solid understanding of the main benefits ofhow you can use data dictionaries and you will be able to set up a fully functioning data dictionary, saving you plenty of time and headaches.
What is a Data Dictionary
If you and your teammates have little knowledge of the content of your database, data warehouse or data lake and have a hard time understanding the meaning of this contentits meaning, you will find data dictionaries a life saver.
A data dictionary is an inventory of data elements in a database or data model with detailed description of its format, relationships, meaning, source and usage. documentation of your database metadata. It describes the structure of your database’s tables and columns, relations and constraints.
It helps you to find and understand data and to create a common knowledge base across your organization about your data assets and data management processes.
So, let’s start with reviewing the benefits of using a data dictionary.
Why Do I Need a Data Dictionary?
Without the data dictionary, you and your data users have a hard time understanding and usingaccessing your organization’s data. They either have to pester database administrators with data queries or they spend hours excavating your databases.
With a data dictionary, you can build out a common knowledge base about your database’s content and the meaning of your data objects.
This will reduce friction between roles, increase onboarding speed, and make your organization less reliant on a few database specialists.
In sum, a data dictionary will benefit you in the following ways:
- Simplify schema design and database development.
- Accelerate database-related tasks.
- Unstuck data processes.
- Liberate data users.
- Make queries reusable.
These benefits show that it can be a good idea to use a data dictionary. Many DBMSs have some type of data dictionary-like functionality built into them. If you can use them, the better.
However, these built-in solutions have many limitations. In the next section, we will review them to assess whether you need something more suitable for your situation.
Don’t I Already Have a Data Dictionary?
However, DBMSs’ default data dictionaries are limited in the scope of metadata they can document and often require active connection to their database with write accessis limited and specific to that database type not core functionalities but rather additions to the database management features. This makes them limited for several reasons:
- The dimensions that your DBMS tracks might not be in line with your business logic.
- If you have databases across multiple engines, their data dictionaries support only their own database type.providers, their schema might be inconsistent with each other.
- Data object names generated by the DBMS or Database Experts do not necessarily reflect the object’s business meaning and can be outright opaque.
- Updating them needs online write access and might impact the data schema.
- Presenting and sharing these data dictionaries can be painful as DBMSs typically do not have reporting capabilities.
Because Dataedo’s is a data dictionary solution at its core, it solves these problems by acting as a central repository of your database metadata operations.
In the next section, we will review how Dataedo is different from these built-in solutions.
Dataedo Data Dictionary
Dataedo Data Dictionary allows you to describe, visualize, and share your data schema and documentation.
You can use it to document the following elements and their relationships:
You can organize these elements into modules and link them to a business glossary.
Dataedo works similarly to an Active Data Dictionary. In an Active Data Dictionary, the DBMS automatically can update the data dictionary with changes in the database structure.
This way is beneficial compared to passive data dictionaries as you do not have to consolidate your documentation with your changing database manually.
Dataedo works in a similar logic using an automatic and configurable schema change import process. Because it is database-agnostic, you can automatically update your data dictionary from multiple databases.
Dataedo’s Data Dictionary also allows you to extend your documentation. It enables you to define and organize your database objects along dimensions that your DBMS does not support.
By this point, you have learned what a data dictionary is and how you can use Dataedo to improve your data documentation, reporting, and development processes.
In the rest of the article, you will learn the “How?” of using Dataedo. You will learn how to set it up and how to document your database. You will also learn some advanced techniques like custom fields and automation.
By the end of the tutorial, you will have confidence in using Dataedo and can start to solve database-related problems in your organization right away!
Let’s jump right in!
I. Set up Prerequisites
In this section, you learn what you need to set up Dataedo, how to set it up, and how you can save your work in it.
Specifically, we will go through the following steps:
- Basic concepts
- Create a repository
- Save changes
- Connect the repository to a new database
This tutorial assumes that you have a working Dataedo instance installed on your system. If this is not the case, consult the installation instructions.
Let’s review the main concepts you need to know about how Dataedo works.
The source database is a database that you want to document with Dataedo. Whenever we use the term database without context, we mean source database.
- SQL Server database
- Azure SQL database
- A local SQL Compact Database file
We will refer to it as the repository or repository database.
A documentation or database documentation is the description of a source database in the repository. A repository can describe multiple databases and, therefore, contain multiple documentations.
We have reviewed the main concepts to set up our first Dataedo Data Dictionary. For further explanations and concepts, please consult our glossary.
Let’s move to the setup stage!
2. Create a new Repository
The first thing you need to do to use Dataedo is to create a repository. Here you learn the steps to do that.
Setting up a repository requires the following steps:
- Choose between file and database-based repository
- Specify your repository parameters
You can set up a new repository by clicking on the Create new repository button on the opening page of Dataedo.
If you are already inside the repository, you can open this screen with the New/Open button.
If you create or open a repository with this button, Dataedo will close the current one and open the new one. You can have only one repository open at the same time.
On the repository creation screen, choose how to store the repository. You can select:
- a. SQL Server database (In a dedicated SQL Server database).
- b. Local file (In a file).
Dataedo automatically populates your new repository with sample database documentations and a sample business glossary. You can play around with them to understand Dataedo faster.
a. Store Your Repository as a Database
If you store your repository as a database, you need to configure the database first.
After the configuration, Dataedo will create a new repository and open its overview.
b. Store Your Repository as a Local File.
If you choose to store your repository as a file, you have to set the path for the file you want to use.
After saving the file, Dataedo opens the overview of the repository.
3. Save changes
We have created our first repository. Before we create our first documentation, let’s talk briefly about how saving works in Dataedo.
Dataedo marks changed but unsaved objects with an asterisk in the navigation panel.
Whenever you make a change in the big right panel, Dataedo won’t save it until one of the following options happen:
- You move to another data object (e.g. another table). For this, you should have the autosave option turned on. Otherwise, Dataedo will ask you whether you want to save the changes (and turn on autosave).
- You click the Save button.
If you have turned off autosave, you can turn it back by clicking on the text below the Save button and selecting Turn autosave on.
Now that we have learned how to save securely or work in Dataedo, we can create our first documentation.
4. Create a new Documentation
After creating or opening a repository, you can connect the databases you want to document to it.
To connect a database to your repository, you need the following steps:
- Select a documentation type.
- Specify its configuration.
a. Documentation Type
To create a new database documentation, click on the Add Source button and select how you want to generate the documentation:
- Import the metadata of an existing database (Database connection).
- Define and document manually (Manual database).
- Create a Business Glossary for your database documentations (Business Glossary).
In this tutorial, we cover only the steps to set up a database connection. For the other options, please, consult their respective documentation.
b. Database Connection
When you choose to document an existing database, you can import its metadata to prefill its documentation.
These are the steps for the metadata import:
- Choose Database connection under Add Source.
- Choose the database engine you want to connect to (e.g., SQL Server, Oracle, MySQL).
- Provide the database connection details.
- Give a meaningful name to your new documentation. You will see the newly created documentation’s folder in the navigation tree (left pane). Click on it, and write the new name into the Title field.
Congratulations, you just connected Dataedo to your source database and created its documentation. Let’s document some data!
II. Document Tables
Now that we have your repository and database documentations set up, we can start documenting!
In this tutorial, you will learn how to document the core elements of tables. Documenting views are almost identical to this with minor exceptions (e.g., having scripts).
You can also document database procedures and functions. For further details, please consult their documentation.
Tables have many documentable elements. Here we cover the fundamental ones:
- Primary and unique keys
Dataedo provides further dimensions along which you can document tables. We do not discuss them in this tutorial, but you can learn about them in their documentation.
A more advanced technique is to define custom fields for tables (and other data documentation objects). You can read more about it in the later part of this tutorial.
1. Table Title and Description
The most straightforward way to document tables is to give them a meaningful name and a detailed textual description. You can do this in the Title and Description fields.
a. Table Title
When Dataedo imports a database schema, it automatically generates names based on the DATABASE.TABLE format. This naming is not always useful for the following reasons:
- The name is not self-explanatory (e.g., based on an abbreviation or n an unrelated context).
- The purpose of the table changes.
- Some applications have generated the database/table names automatically.
You can correct this by giving a meaningful name to the table by the following steps:
- Select the table in the navigation menu. It will automatically open the Table tab in the right panel.
- Click into the Title field and add a meaningful name to the table.
After you have saved this change, Dataedo will show the new title in reports and summaries.
b. Table Description
The Description field accepts rich text. You can add formatting to it and even include links, tables, and images.
If previously you had imported your documentation from a database, Dataedo inserts table and column comments into the Description field. However, it won’t write back your description edits into the database as comments.
You can access and edit the Description field by the following steps:
- Find the Table element in the navigation pane and click on it.
- Edit the Description field in the editing pane, under the Table tab.
After editing, you need to (auto)save the changes.
You are free to write anything into the description that might be useful for your audience. Here are a few tips:
- Meaning and purpose of the table
- Table objects and their types
- Granularity: What does a single row represent?
- Source of data
- Related calculations
- Standards and data validation criteria
- Related SQL queries
- Related use cases
The important thing is that what you write should add information to your users and reduce their confusion.
In the next section, we will learn how to document columns.
2. Column Documentation
Another effective way to describe a table is to describe its columns.
When you import metadata from a database, Dataedo automatically imports the following information:
- Position in the table (#)
- Unique identifiers and primary keys
- Data type
- Default value
- Computed specification
- Description (from database comments)
You can document columns at the following fields:
- Custom fields
a. Column Description
Steps to change column description:
- Go to the Columns tab of the table you want to edit.
- Click into the Description column and write your edits.
Again, you can add anything to the Description field. Here are a few practical ideas:
- Meaning and purpose of the column.
- Where does the data come from?
- Range or list of possible values and their format.
Be sure to follow best practices!
b. Column Title
Similarly, as in the case of table names, you can rename table columns.
Follow these steps to add a new name to them:
- Select the table in the navigation panel.
- Select the Columns tab.
- Click into the Title field and add a meaningful name to the column.
3. Bulk Column Editing
Dataedo allows you to edit columns in multiple tables with the same name simultaneously. This makes it easier to update common columns in your tables (e.g., created_date, row_version).
You can edit columns in bulk by the following steps:
- Right-click on the name of the column you want to edit to open the context menu.
- Select** Find [COLUMNAME]** columns. This will open an editor with the columns of the same name across your whole repository.
- Review them and make sure that all the columns have the same meaning and not just the same name!
- Edit the fields you want to change (Description, Title, or your custom fields).
- Click Save.
4. Document Primary and Unique Keys
Dataedo reads primary keys and unique indexes/constraints from the database schema. It stores them under the Unique keys tab.
If your database does not define unique constraints, or if you want to add further ones, you can define them in Dataedo.
Dataedo uses colors to differentiate key types:
- Unique key: yellow
- Primary key: blue
Manually added keys have a small person icon next to them.
To add new unique keys, you have to open the Add key screen:
- Select the table.
- Navigate to the Unique keys tab.
- Select Add key. This will open a screen where you can define the new key.
In the Add key screen, you can add the following unique key information:
- [Optional] A new key name.
- Key type.
- Key columns.
- [Optional] A description and explanation of why these columns are/should be unique.
5. Document Relationships
Dataedo imports existing relations between tables (foreign keys) into the repository if the database has this information.
If the database stores relationship information or if you want to add further ones, you can define them with Dataedo.
Dataedo distinguishes manually added relations by a small person on its icon.
Dataedo won’t push back the relationships you defined here to the database. You will be able to use them within the repository (e.g., in ERDs generated in Dataedo).
You can define and document additional relationships in the following way:
- Navigate to the Relations tab in the table view of the table that you want to connect.
- Select Add relation. This will open a screen asking for details.
On the new screen, follow these steps:
- [Optional] Give a title for the relationship.
- Select the database and table to which you want to make the connection (PK Database and PK Table).
- Select the column that you want to use as a foreign key in this table (FK Column).
- Add further details in the Description field.
- [Optional] Define the type of relationship between the tables (e.g., One-Many, Many-Many).
We have reviewed the main ways you can document your database tables. In the next section, you will learn how to organize your tables into business-specific modules and visualize them with diagrams.
III. Organize and Visualize Your Data Objects With Modules and ERDs
With Dataedo, you can organize your data objects into modules. This allows you to have a better overview of them around your business logic.
When you have such modules, you can visualize their objects and relationships with Dataedo’s Entity Relationship Diagrams (ERDs). ERDs give you a compact sharable overview of your databases’ structure. You and your colleagues can use ERDs to communicate business and data problems effectively.
In the rest of this section, you will learn how to define these modules and create diagrams.
1. Organize Data Objects into Modules
You can review the existing modules belonging to a documentation under its Modules & ERDs folder in the navigation pane.
a. Create a new module
To create a new module:
- Select the database documentation in the navigation pane.
- Select Add module/ERD. This will create a new module.
- Add a name in the newly opened field.
b. Document a module
You can document modules under the Module tab. You can change their name (Title) and add a description (Description).
Some valuable things to add to the Description field:
- Meaning of and purpose of the module
- Business definitions and rules
- Key entities
After creating a module, you can visualize it with ERDs.
2. Create Entity Relationship Diagrams (ERDs)
You can generate Entity Relationship Diagrams by the following steps:
- Select the ERD tab under the module you want to visualize. This will open the diagram view in the center and an object selector on the right side.
- Pick those objects that you want to visualize. You can do this either by double-clicking on them or by drag and dropping them into the diagram view.
These ERD diagrams will be handy when you publish your data documentation as Dataedo will include them.
In the next section, you will learn about how you can publish your data dictionary.
IV. Publish Your Data Dictionary
You can export your documented data dictionary into multiple formats:
Publishing your data allows you to share your database documentation with your colleagues, customers, or decision-makers tailored for their specific use case.
You can export your documentation into most of the formats by the following steps. For differences, see the respective sections here or in the documentation.
- Click the Export button. This will open the Export documentation screen.
- Choose the export format (e.g. HTML, PDF, Excel).
- Select a documentation template or create one.
- Select the modules to include.
- Select the object types (e.g., tables, relations) to include.
- Select the custom fields to export.
- Define the export save path.
In the rest of this section, we will discuss two publication types in more detail: HTML and the Web Catalog.
1. HTML export
The HTML export is a static, offline, and interactive format of the documentation:
- Static: You generate it once, and you cannot update it with the changes in the repository. If you want an update, you have to recreate it.
- Offline: It does not require a connection to the repository, so you can publish or host it anywhere.
- Interactive: You can search it and navigate within it through the graphical interface.
Check out this example!
2. Web Catalog
Dataedo Web Catalog is an on-premises, self-hosted web format of the data dictionary. Compared to an HTML export, the Web Catalog has the following features:
- Dynamic synchronization. You can edit the same repository with different Web Catalog and Desktop instances.
- Community contributions: Multiple users can edit the repository and add community-specific content (e.g., comments, ratings, questions).
- Rights management: You can differentiate roles (e.g., Community member, Data Steward) with different editorial rights.
Web Catalog requires an Enterprise license. You can read more about the uses and requirements of Web Catalog here.
You cannot publish your repository as a Web Catalog from Dataedo Desktop. Instead, you have to install a Web Catalog instance. There are multiple ways to do this. To see which fits your situation, please, consult the Web Catalog install documentation.
We have looked through how you can publish and share your documentation in different formats. By this point, you already know enough about Dataedo to start to document your database with it.
In the rest of the article, you will learn more advanced documentation techniques to make your document process more effective.
V. Advanced Documentation Techniques
Dataedo has plenty of more advanced techniques. In this section, we will focus on two specific ones:
- Custom fields
- Progression tracker
1. Define Custom Fields
In this section, you will learn an advanced documentation technique: custom fields.
Custom fields are dimensions you can define in addition to Dataedo’s default documentation dimensions. You can use them to add further dimensions and introduce further structuring into your documentation.
Dataedo defines custom fields at the repository level so you can use them in all database documentations within the same repository.
You can define custom fields for the following objects:
- Primary/unique keys
You can define custom fields in the following way:
- Select Custom fields on the ribbon. This will open a new screen listing the custom fields currently defined in the repository.
- Click on the Add button. This will open the** New Custom Field **screen.
- Name the field under the Title field.
- Select a Type.
- Select a Class.
- Select those documentation objects where this field is meaningful and will be active.
- Describe the field at Description.
Dataedo provides a few suggested custom fields at the bottom of the screen. You can quickly activate them with a click.
Examples for useful custom fields:
- Source mapping
Let’s examine one of these examples from a bit closer.
Example Custom Field: Table/Column Status
Many databases have unused or temporary tables and columns because of development changes. You do not want to remove them, but you also want to keep them in a different status.
To maintain this separation, you can define a Status custom field for your tables and columns and mark the ones that have a special status.
This will allow you to filter data objects which are still in development or defunct or have some additional status not applicable to your specific report.
2. Tracking description progress
When you work with multiple databases, it is hard to overview the documentation’s overall status.
Dataedo can help with showing the documentation coverage of your tables and views in a visual progress tracker. It calculates progress based on the proportion of non-empty Description fields in your documentation.
Dataedo can track progress in different modes:
- All descriptions: Tracks tables, columns, relations, keys, triggers, procedures, functions, and parameters.
- Tables & Columns.
- Custom fields: Tracks progress on a specific custom field.
To turn on progress tracking, click on the **Show progress **button on the ribbon.
To change the progress tracking mode, follow these steps:
- Click on the Show progress/Hide progress text at the bottom of the button. This will open a dropdown menu.
- Select the mode you want to track.
The progress tracker can help you to decide what data objects require much documentation effort and can give hints about the types of information that they are missing.
3. Further Advanced Features
Dataedo offers many more advanced documentation features, but we do not have space to discuss them all.
Feel free to check out this list of examples and improve your data documentation by applying them:
- Document stored procedures, functions, and triggers.
- Discover database code and object dependencies.
- Discover and classify sensitive data.
- Document cross-database dependencies and relationships.
- Document different databases and technologies in one documentation.
- Easily track empty descriptions and custom fields.
In the last section of this tutorial, you will learn how you can make your documentation processes more automatic with the help of Dataedo.
VI. Maintain and Automate Your Data Dictionary
One of the most painful things in documentation is keeping up with development changes while maintaining the repository’s consistency and your personal sanity.
Dataedo can help you out in this with the following means:
- Track documentation progress.
- Import data schema changes.
- Remove defunct objects.
- Export documentation.
With Dataedo, you can build an automatic workflow of these steps and run it based on your operation’s schedule.
The automation process consists of the following high-level steps:
- Repository setup and data schema import.
- Manual documentation edit and extension.
- Automated schema updates.
- Automated document generation.
In the rest of this section, you will learn these maintenance steps and how to schedule them with Dataedo.
1. Import schema changes
As your team further develops your databases, the data schema becomes modified. Dataedo can import schema changes and update the documentation with them.
Steps to import schema changes:
- Click on the Import changes button on the ribbon. This will open the change import screen.
- Confirm the database connection (if you have it saved) or set it up again.
- Review schema changes since the last import. Exclude those objects you do not want to import.
You can read more about schema change import in the documentation.
2. Remove/Ignore Objects
Dataedo allows you to remove the documentation of defunct database objects from your repository.
To remove them directly, follow these steps:
- Right-click on the documentation object you want to remove.
- In the context menu, select Remove from repository.
You can define rules to ignore objects to prevent them from being imported in the future.
3. Schedule Dataedo Tasks
One of the most powerful features of Dataedo’s automation process is scheduling updates and exports to specified times.
You can do this by following these steps:
- Create an XML command file to specify the task to execute.
- Schema update: Select the Save import command at the last step of the import process. - Documentation export: Save the command file at the last step of your preferred export documentation process. 2. Set up tasks in the Windows Scheduler by using the respective command files as their parameter.
Beyond a Single Database: Build a Data Catalog
By this point, you have learned what a data catalog is and how it benefits your organization. You also have learned how to use Dataedo to create and maintain a data catalog and to apply advanced techniques like custom fields and automation.
This concluding part shows you how you can extend your data dictionary into a data catalog.
We often use the two terms ― data dictionary and data catalog ― interchangeably. This is acceptable for simpler cases. However, it might be useful to differentiate between them to highlight the different development stages of data documentation.
A data dictionary describes a database, its content, data types, and data quality.
As your data and documentation processes grow, you will start to include multiple databases or even a whole data warehouse or data lake. You will also begin to build a business glossary and try to get a grasp of your data lineage.
Somewhere at this point, your data dictionary becomes more than just a description; it becomes a data catalog.
A data catalog can mean this inventory of all your data assets within your organization. It would contain data-specific information and its relationship to your business operations and its overall creation and management process.
It can also mean the software solution you use to maintain this data catalog with a possible dynamic relationship with the data assets it describes.
This shows that as your organization grows, you will face novel data management challenges and need to have the proper tools to answer them.
The real good news is this: However you call it, Dataedo will improve your data documentation processes at any level and any stage. You can document only a few tables or be responsible for overviewing a huge data lake. With Dataedo, you will save loads of time for you and your teammates!
So, which one do you choose? Desperate data confusion or streamlined reporting? Let us know!
Create your own data dictionary
Get started with Dataedo and create your own data dictionary.