Basic concepts and metadata in Dataedo

Applies to: Dataedo 23.x (current) versions, Article available also for: 10.x

This article is an overview of basic concepts in Dataedo.

Metadata Repository

Repository is a central database where all the metadata is stored. It can be in form of SQL Server database or Azure SQL database.

Learn more about Dataedo Metadata Repository

Metadata Connectors

Metadata connectors are built in libraries that allow users to connect to their databases and extract metadata - information about tables, columns, relationships, etc.

There are native connectos to specific database technologies (such as Microsoft SQL Server, Oracle) and generic connectors that can be used to connect to any technology with an external driver (ODBC).

Image title

List of metadata connectors

Data Dictionary/Catalog

Data Dictionary or Catalog is that space in Metadata Repository that holds information about metadata extracted from documented databases.

Databases/Connections

Users can add multiple connections to the catalog. Connections allow you to scan source databases and extract metadata. They can be used to connect multiple times to the data source and update schema in the repository. Connection may, but does not have to, store credentials in the repository.

Manual databases

Users can also create "manual" databases without a connection. This allows creation of user-defined objects.

Tables and Columns

Main data asset in the repository is a table. It represents tables in the relational databases or analogous object in other data technologies. Tables consist of columns. Each column has a name, data type, nullability, default value, calculation formula.

Structures

Structures are similar assets to tables, but are used to represent file formats (such as JSON, or XML).

Primary/Unique keys

Dataedo documents primary (PK) and unique keys (UK) for tables. This information is extracted from constraints and unique indexes from database schema. Users can add manual keys to every table, view, and structure.

Table relationships (foreign keys)

Relationships between tables (views and structures) are important element of data model specifications. Relationships, or foreign keys, inform how different data sets are related and how to join them (e.g. using SQL).

This information is extracted from foreign key constraints from database schema and allows users can add manual relationships. Relationships can be created between different types of objects (tables, views, structures), and across different databases.

Views, Procedures, Functions, Triggers

Apart from tables, Dataedo imports from databases other object types: views, stored procedures, user-defined functions, and table triggers.

Dependencies

Where database data dictionary (system catalog) provides this information, Dataedo metadata connectors import information about object dependencies (such as usage of an object by another object). User can also add this information manually to the repository.

Descriptions

One of the main functionalities of Dataedo is the ability to describe every data object and element.

Descriptions

Every object in Dataedo has a description field. Description enables data stewards, architects, and developers to document data models.

Descriptions are imported to the repository from sources, but only once (when description field in Dataedo is empty). Once description is provided in the repository it is not overwritten with subsequent updates (Dataedo is master source of descriptions).

Objects (e.g. tables) support rich text descriptions. Object elements (columns, parameters, etc.) support plain text description fields.

Titles/Aliases

Every technical asset (imported from data source) has a title field that enables data stewards translating technical names (often confusing) to business language.

Custom fields

Apart from title and description, Dataedo allows admins to define up to 100 additional custom fields. Fields can have different types, such as closed lists, open list, or checkboxes.

Learn about custom fields

Physical, Manual and Deleted Objects

You can think of metadata in two categories in terms of its origin:

  1. Imported from the data source (physical)
  2. Added manually by user in Dataedo (manual)

Physical Objects

Physical objects represent actual objects from the data source, as they were on the time of last import.

You can remove objects from the repository (for instance when table is temporary), but it you can not remove it's element (e.g. column).

Manual Objects

Dataedo enables users to add to the repository objects (such as tables) manually. Objects can be added to manual databases and to connections.

It is also possible to add manual elements to physical objects (e.g. primary keys, relationships, or manual columns in physical tables).

This allows mixing physical metadata with manual.

Objects that can be added manually:

  1. Manual tables
  2. Manual columns (in physical tables)
  3. Table relationships
  4. Primary/Unique keys
  5. Dependencies

Use cases for manual objects:

  1. Data modeling
  2. Physical data model enrichment - virtual/calculated columns, virtual tables

Deleted Objects

When object or element is deleted or renamed in the source it is marked as deleted in the repository. It is not deleted so that you do not lose any valuable metadata. It can be deleted manually.

Learn more about importing changes from

Subject Areas/Modules

One of the interesting metadata management features in Dataedo are Subject Areas (or previously Modules). You can think of Subject Areas as folders within databases that can group database objects (not only from the same database), have a rich text description and ER diagram.

Use cases of Subject Areas:

  1. Organize tables into folders,
  2. Provide wiki-style articles for key business concepts.

Updating Catalog

Importing Changes

Important capability of Dataedo is the automatic maintenance of metadata in data catalog with smart Import changes functionality. This functionality connects to the data source, reads changes in the schema, and applies them to the catalog:

  1. Adding new objects or elements
  2. Updating attributes (e.g. data type)
  3. Marking objects as deleted
  4. Logging changes in schema change tracking functionality

Change imports can be run manually from Desktop, incorporated into other processes (e.g. build process) or scheduled with use of command line functionality.

Learn more about importing changes from

Schema Change Tracking

When Schema Change Tracking mode is enabled in the repository then each change imported from the source is logged in separate metadata structures and accessible for analysis and documentation in dedicated reports.

Learn more about Schema Change Tracking

ER Diagrams

Using metadata from the catalog - tables, columns and relationships - Dataedo helps you visualize data model with ER Diagrams (ERDs).

  • Manual ER diagrams - available in Dataedo Desktop for Subject Areas/Modules, users choose which entities and columns are presented, and determine their location,
  • Automatic ER diagrams - Dataedo Web Catalog displays diagrams automatically.

Business Glossary

Business Glossary is advanced topic of metadata management - it is a glossary of terminology used in your organization. It is different from data dictionary in that is organization wide and each data entity or element has only one definition, where data dictionary is data source centric and has multiple instances of the same data elements.

Each term has a title, definition and optional custom fields.

Learn about Business Glossary

Entry types

Apart from business terms, Dataedo supports other glossary entry types:

  • Policies,
  • Rules,
  • Categories.

Glossaries

Terms are organized in glossaries. Glossary is similar concept to Database, but holds business terms, rather than tables.

Term relationships

Data stewards can define extensive network of relationships between terms and other entries in the glossary.

Data Dictionary-Business Glossary Links

Important benefit of building business glossary and data dictionary in one catalog is the ability to link the two. Dataedo allows linking terms with specific tables and columns.

This allows users to:

  1. Discover location of key data elements in databases,
  2. View business definitions (on top of technical) for physical data assets - tables and columns.

Learn about linking Business Glossary with data assets

Data Profiling

Dataedo allows you to discover data stored in the database and review its contents and quality. Data Profiling module is a combination of useful metrics with a friendly User Interface. On top of it, Profiling in Dataedo allows you to peek into the most common, or random data from your tables. Learn more about Data Profiling.

Data Profiling

Data Lineage

Allows you to manually configure and browse how data flows between your objects and data sources. This module contains of responsive browser in the Web Catalog.

Data Lineage

And manual designer in the Desktop, where you can define processes and flows.

Desktop

Read more about Data Lineage in Dataedo

Data Classification

Dataedo helps you comply with data protection regulations by finding and tagging fields holding sensitive data. It supports multiple parallel classifications, each for different purpose/regulation. It includes two built-in functions: PII and GDPR.

Fields and classifications are suggested automatically based on column names and classifications of other fields in the repository. Final decision is made by users. Users are also allowed to curate classifications in the repository manually.

Classifications are stored in a pair of custom fields:

  • Data Classification - Level of sensitivity: Sensitive, Special category, etc.
  • Data Domain - Type of information: email, name, etc.

Admins can create their own classifications.

Learn about Data Discovery and Classificaiton

Suggestions

Dataedo helps you with documentation efforts by suggesting text from descriptions from other columns.

Learn about description suggestions

Community

Dataedo Web Catalog has Community module that allows user collaboration with comments, questions, warnings, and rating.

Comments (threads)

Users can comment each data object or element. Other users can comment on those comments creating threads.

Actionable comments (open/closed)

Users may choose to open threads that require action. They start in open status, and when task has been completed (e.g. question answered) user may choose to close it.

Rating

Users can rate each object and asset giving it 1-5 stars. Average rating of the asset is visible to everyone next to the object.

Warnings

Users can tag objects and elements with warnings. Warnings, while open, indicate with icon that users should be aware of something important when using particular data asset.

Found issue with this article? Comment below
Comments are only visible when the visitor has consented to statistics cookies. To see and add comments please accept statistics cookies.
0
There are no comments. Click here to write the first comment.