This article is an overview of basic concepts in Dataedo.
Repository is a central database where all the metadata is stored. It can be in form of SQL Server database or Azure SQL database.
Metadata connectors are built in libraries that allow users to connect to their databases and extract metadata - information about tables, columns, relationships, etc.
There are native connectos to specific database technologies (such as Microsoft SQL Server, Oracle) and generic connectors that can be used to connect to any technology with an external driver (ODBC).
Data Dictionary or Catalog is that space in Metadata Repository that holds information about metadata extracted from documented databases.
Users can add multiple connections to the catalog. Connections allow you to scan source databases and extract metadata. They can be used to connect multiple times to the data source and update schema in the repository. Connection may, but does not have to, store credentials in the repository.
Users can also create "manual" databases without a connection. This allows creation of user-defined objects.
Tables and Columns
Main data asset in the repository is a table. It represents tables in the relational databases or analogous object in other data technologies. Tables consist of columns. Each column has a name, data type, nullability, default value, calculation formula.
Structures are similar assets to tables, but are used to represent file formats (such as JSON, or XML).
Dataedo documents primary (PK) and unique keys (UK) for tables. This information is extracted from constraints and unique indexes from database schema. Users can add manual keys to every table, view, and structure.
Table relationships (foreign keys)
Relationships between tables (views and structures) are important element of data model specifications. Relationships, or foreign keys, inform how different data sets are related and how to join them (e.g. using SQL).
This information is extracted from foreign key constraints from database schema and allows users can add manual relationships. Relationships can be created between different types of objects (tables, views, structures), and across different databases.
Views, Procedures, Functions, Triggers
Apart from tables, Dataedo imports from databases other object types: views, stored procedures, user-defined functions, and table triggers.
Where database data dictionary (system catalog) provides this information, Dataedo metadata connectors import information about object dependencies (such as usage of an object by another object). User can also add this information manually to the repository.
One of the main functionalities of Dataedo is the ability to describe every data object and element.
Every object in Dataedo has a description field. Description enables data stewards, architects, and developers to document data models.
Descriptions are imported to the repository from sources, but only once (when description field in Dataedo is empty). Once description is provided in the repository it is not overwritten with subsequent updates (Dataedo is master source of descriptions).
Objects (e.g. tables) support rich text descriptions. Object elements (columns, parameters, etc.) support plain text description fields.
Every technical asset (imported from data source) has a title field that enables data stewards translating technical names (often confusing) to business language.
Apart from title and description, Dataedo allows admins to define up to 100 additional custom fields. Fields can have different types, such as closed lists, open list, or checkboxes.
Physical, Manual and Deleted Objects
You can think of metadata in two categories in terms of its origin:
- Imported from the data source (physical)
- Added manually by user in Dataedo (manual)
Physical objects represent actual objects from the data source, as they were on the time of last import.
You can remove objects from the repository (for instance when table is temporary), but it you can not remove it's element (e.g. column).
Dataedo enables users to add to the repository objects (such as tables) manually. Objects can be added to manual databases and to connections.
It is also possible to add manual elements to physical objects (e.g. primary keys, relationships, or manual columns in physical tables).
This allows mixing physical metadata with manual.
Objects that can be added manually:
- Manual tables
- Manual columns (in physical tables)
- Table relationships
- Primary/Unique keys
Use cases for manual objects:
- Data modeling
- Physical data model enrichment - virtual/calculated columns, virtual tables
When object or element is deleted or renamed in the source it is marked as deleted in the repository. It is not deleted so that you do not lose any valuable metadata. It can be deleted manually.
One of the interesting metadata management features in Dataedo are Subject Areas (or previously Modules). You can think of Subject Areas as folders within databases that can group database objects (not only from the same database), have a rich text description and ER diagram.
Use cases of Subject Areas:
- Organize tables into folders,
- Provide wiki-style articles for key business concepts.
Important capability of Dataedo is the automatic maintenance of metadata in data catalog with smart Import changes functionality. This functionality connects to the data source, reads changes in the schema, and applies them to the catalog:
- Adding new objects or elements
- Updating attributes (e.g. data type)
- Marking objects as deleted
- Logging changes in schema change tracking functionality
Change imports can be run manually from Desktop, incorporated into other processes (e.g. build process) or scheduled with use of command line functionality.
Schema Change Tracking
When Schema Change Tracking mode is enabled in the repository then each change imported from the source is logged in separate metadata structures and accessible for analysis and documentation in dedicated reports.
Using metadata from the catalog - tables, columns and relationships - Dataedo helps you visualize data model with ER Diagrams (ERDs).
- Manual ER diagrams - available in Dataedo Desktop for Subject Areas/Modules, users choose which entities and columns are presented, and determine their location,
- Automatic ER diagrams - Dataedo Web Catalog displays diagrams automatically.
Business Glossary is advanced topic of metadata management - it is a glossary of terminology used in your organization. It is different from data dictionary in that is organization wide and each data entity or element has only one definition, where data dictionary is data source centric and has multiple instances of the same data elements.
Each term has a title, definition and optional custom fields.
Apart from business terms, Dataedo supports other glossary entry types:
Terms are organized in glossaries. Glossary is similar concept to Database, but holds business terms, rather than tables.
Data stewards can define extensive network of relationships between terms and other entries in the glossary.
Data Dictionary-Business Glossary Links
Important benefit of building business glossary and data dictionary in one catalog is the ability to link the two. Dataedo allows linking terms with specific tables and columns.
This allows users to:
- Discover location of key data elements in databases,
- View business definitions (on top of technical) for physical data assets - tables and columns.
Dataedo allows you to discover data stored in the database and review its contents and quality. Data Profiling module is a combination of useful metrics with a friendly User Interface. On top of it, Profiling in Dataedo allows you to peek into the most common, or random data from your tables. Learn more about Data Profiling.
Allows you to manually configure and browse how data flows between your objects and data sources. This module contains of responsive broswer in the Web Catalog.
And manual designer in the Desktop, where you can define processes and flows.
Read more about Data Lineage in Dataedo
Dataedo helps you comply with data protection regulations by finding and tagging fields holding sensitive data. It supports multiple parallel classifications, each for different purpose/regulation. It includes two built-in functions: PII and GDPR.
Fields and classifications are suggested automatically based on column names and classifications of other fields in the repository. Final decision is made by users. Users are also allowed to curate classifications in the repository manually.
Classifications are stored in a pair of custom fields:
- Data Classification - Level of sensitivity: Sensitive, Special category, etc.
- Data Domain - Type of information: email, name, etc.
Admins can create their own classifications.
Dataedo helps you with documentation efforts by suggesting text from descriptions from other columns.
Dataedo Web Catalog has Community module that allows user collaboration with comments, questions, warnings, and rating.
Users can comment each data object or element. Other users can comment on those comments creating threads.
Actionable comments (open/closed)
Users may choose to open threads that require action. They start in open status, and when task has been completed (e.g. question answered) user may choose to close it.
Users can rate each object and asset giving it 1-5 stars. Average rating of the asset is visible to everyone next to the object.
Users can tag objects and elements with warnings. Warnings, while open, indicate with icon that users should be aware of something important when using particular data asset.