Product Vision & Roadmap

Piotr Kononow - Dataedo Team Piotr Kononow 22nd August, 2023

How to read this article?

This article explains the vision for the product and roadmap of its development for the next two years. Please bear in mind that this roadmap can change, as with each release (every 3-6 months) we update our plans, and vision evolves over time with your feedback, requests and the direction the market is going.

Below are the key development areas:

AI

We are investing into research of multiple AI/ML functionalities. Here are the key ones:

Auto documentation! (Q3 2023)

Chat GPT is really good in documenting tables and their columns and writing definitions of business glossary terms. We are enabling you plug in your Chat GPT subscription into Dataedo (other engines will follow in the future). You will be able to ask Chat GPT to document your table, edit and save those descriptions or save it in a separated field. This will speed up documentation process by a lot!

Image title

AI Chat (Q1 2024)

Another big update on the AI/LLM front - build in AI chat powered by Chat GPT. You will be able to converse with it using natural language, and ask questions about your data, to help you find tables, reports or even write a query for you! Later perhaps it will be able to execute it for you and present you with the data.

Data Governance

Domains (Q3 2023)

In our 23.2 release, we are proud to highlight the enhancement of our subject areas, evolving them into more sophisticated domains. Over the past year, we've witnessed the emergence and growing popularity of the Data Mesh paradigm, a data management strategy that decentralizes a single data team into distinct domains.

With Dataedo, not only can you mirror this structure, but you can also categorize data assets - tables, reports, terms, and the like - under the appropriate domain. But that's not all. Within each domain, you will be able to further define a hierarchy of subject areas and map your assets directly to these specific areas.

But hold on, there's even more. We're introducing dual domain definitions to cater to a broader scope:

  1. Data Domains: Designed to encapsulate your data reflecting the perspectives of data teams, IT, and applications.
  2. Business Domains: Tailored specifically for an organizational standpoint, these domains provide a holistic view of your data landscape with the aim of bolstering Data Governance and Enterprise Architecture. It also provides a wiki-esque experience, enabling data consumers to navigate, learn, and delve deep into your business and data—from high level overview, down to more specific topics, to specific terms, reports and tables.

Those domains will live separately from the data sources (current subject areas are defined under each data source).

Finally, domains and subject areas will enable you to define permissions to specific assets. You will be able to assign access to specific domains and subject areas and linked objects will inherit this access for specific user group or user.

Workflows (Q1-Q3 2024)

"Workflows" is umbrella name for a family of features that will help you manage authoring of your documentation:

  1. Drafts (Q1-Q2 2024) - Dataedo will introduce a two-tiered system, objects will be in draft and published states. This will apply to user created objects, such as domains, subject areas, terms, and imported objects - tables, reports and stored procedures. Draft objects will only be visible do data stewards (editors), and viewers will only be able to see published objects.
  2. Statuses (Q1-Q2 2024) - Building on the foundation of the draft/published system, Dataedo will offer a feature allowing objects to adopt a status from a custom-defined list available in your repository. This enhancement (inspired by Jira) serves to provide users with the ability to accurately determine the progress phase of specific object's documentation, and identify who is currently responsible for it. Each status will be limited to specific roles, so not everyone will be able to move object to any status (permissions will probably not be in version 1 of this feature).
  3. Change suggestion (Q1-Q2 2024) - Soon, everyone (with a free viewer license and the right permission) can suggest edits to descriptions of any asset. These suggestions go to a dashboard where editors decide to accept or reject them.
  4. Request for change (late 2024) - In addition to suggesting edits, users can soon ask for changes to any object's documentation.

Certifications (Q2-Q3 2024)

One of the important aspects of data governance and data democratization is trust in data. One of the ways you can increase trust in data is by certification of assets by data stewards and data owners. Dataedo will enable them to provide a certificate for each asset in the catalog, each with a timestamp and signed by specific person.

Data Classifications (Q1-Q2 2024)

While Dataedo currently identifies sensitive data based on column names, we're taking a step further. Our new feature will scan the actual content of the data, determining its type. This means it can detect whether the data is sensitive, like an email, name, address, etc..

Data Steward Assistant (2024)

Data Steward Assistant is a suite of features crafted to assist Data Stewards in their daily tasks. With the aid of built-in rules, dictionaries, and sophisticated language models like Chat GPT, the Assistant recommends areas of improvement, suggesting documentation fill-ins and potential entity links. This feature suite will be unveiled in stages throughout 2024 and beyond.

Here's a peek into some of the Assistant's capabilities:

  1. Domains
    • Propose potential domains and subject areas
    • Offer suggestions for creating links from domains to data assets
  2. Business Glossary
    • Recommend possible terms for creation
    • Show terms by their definition status (e.g., to define, for review, publish)
    • Suggest linking terms to columns, tables, and reports
  3. Data Dictionary
    • Identify missing descriptions or overly simplistic ones that need refining
    • Propose improved descriptions
  4. Foreign keys
    • Advise on potential foreign keys, drawing from names and SQL queries/views in the repository
  5. Data Classification
    • Suggest classifications, enhancing the current Data Classification module
  6. Reference Data
    • Display lookups according to their definition status
    • Recommend lookups based on column names, data types, and data profiling
    • Suggest how to link defined lookups to other columns in the catalog (e.g., connecting a "Country" lookup to every country column in the repository).

With the Data Steward Assistant, Dataedo reinforces its commitment to streamlining the data documentation process.

Data Discovery

We are prioritizing on how viewers find and discover metadata in our catalog. That's why we will be redesigning navigation and most lists, forms and diagrams in Dataedo Web Catalog.

Redesigned Data Lineage diagrams (Q1 2024)

We are redesigning Data Lineage diagrams to make it easier to track and discover lineage and impact analysis. Here are some enhancements:

  1. Better analysis of lineage of specific column
  2. Aggregated views hiding technical details (such as ETL packages)
  3. Preview of SQL that moves data
  4. High level lineage between sources
  5. Expand-collapse option

Redesigned ERD diagrams (Q1 2024)

We will merge ERD diagrams from Desktop and Web Catalog and enable users to modify and save ERDs directly in the Web Catalog.

ERDs for SQL queries and views (Q1-Q2 2024)

Dataedo Web Catalog can currently draw a diagram of lineage for a database view or SQL query. In the future we would like to visualize query joins (including nested queries) in an ER diagram (similarly to how SSMS does it).

Similar objects (Q1 2024)

To make it easier to find the right report, dataset or column Web Catalog will be showing similar objects based on various built in rules.

Keyword Explorer (Q3 2023)

Keyword Explorer is our new experimental feature that breaks down different keywords used in column and table names and allows you do explore keyword could and assets consisting of those keywords. This feature is useful to discover what entities databases even hold.

Image title

Image title

Data Lineage

More data lineage metadata (Q3 2023)

We are providing more metadata fields that

Column level lineage:

  1. Transformation
  2. Description
  3. Custom fields

Processes:

  1. Description
  2. Script (e.g. SQL)

Data Lineage diagrams

See Redesigned Data Lineage diagrams in Data Discovery section.

Data Lineage automations

See Connectors and Metadata Extraction section.

Connectors and Metadata Extraction

Databases and platforms

  1. MS Fabric (Q3 2023-?)
  2. Azure Synapse (Q3 2023) - importing lineage from Synapse pipelines.
  3. Databricks (Q1 2024?)
  4. Microsoft Dynamics (Q1 2024) - extraction of forms and their lineage.
  5. SSAS Multidimensional (Q3 2023) - Cubes, dimensions, data source views and lineage.
  6. Amazon Redshift (Q3 2023) - importing lineage for external tables and copy logs.
  7. Snowflake (Q3 2023) - importing lineage for external tables and copy logs.

SQL Parsing

Building lineage for insert and update statements in stored procedures.

BI tools

  1. Power BI (Q3 2023) - on prem support, improved automated column-level lineage.
  2. SSRS (Q3 2023) - documenting reports and automated column-level lineage using SQL parsing.
  3. Tableau (Q3 2023) - importing data flows.
  4. Qlik (early 2024)
  5. Looker (2024)

ETL tools

  1. SSIS (Q3 2023) - improved column-level lineage, including SQL parsing.
  2. Talend (2024)

Extracting and documenting indexes (early 2024)

This feature is requested for a long time and we are happy to announce that we decided to add support for indexes.

Import mechanism improvements (Q3 2023 - Q2 2024)

We are redesigning our import mechanism completely to provide following improvements:

  1. Improved performance (Q3 2023) - import takes less time.
  2. Better progress tracking (Q3 2023)
  3. Ability to import multiple databases from the sever at once (Q2 2024)
  4. Easier selection of schemas and object types (Q1-Q2 2024)
  5. More capabilities imports (Q1-Q2 2024) - at import you will be able to automatically profile data or refresh lookups.

Import from Web Catalog

Currently, importing metadata is only available with Desktop or command line files. We are working on enabling imports directly in Web Catalog.

Convenient scheduler (Q2-Q3 2024)

We want to make it easier to schedule imports and make sure your metadata is up to date. One of the features that will help us achieve that is a feature that allows easy scheduling and monitoring status of imports of each source from Web Catalog.

File/data lake imports improvements (Q1 2024)

We are planning to improve importing changes and new files from data lake/storage sources and data files.

Custom connector (Q3 2023)

To enable connection to more sources and make metadata extraction more customizable, we are building a functionality that will allow us and users to build your custom connector to SQL compatible sources, by providing a set of SQL scripts that extract metadata. This connector can be run on ODBC connection making it possible to extract rich metadata to currently unsupported sources, and much more metadata than generic ODBC connection currently provides.

Interfacing tables

Interfacing tables enable you to upload metadata into Dataedo repository from external sources. All you need is to extract metadata on your own and upload them into predefined tables in our repository and run an import. We are extending tables to support more metadata.

Supported metadata:

  1. Reports (Q3 2023)
  2. Column level data lineage (Q3 2023)

Data Profiling

JSON structure discovery (Q1 2024)

With the advent of Big Data, relational databases abandoned their strict data models in many places and developers are saving unstructured data in text/JSON columns. Understanding of the structure of documents in those columns is as crucial as understanding columns in a table. Therefore, we will provide a JSON scanner for columns identified by data stewards (Data Steward Assistant could also suggest such fields). This scanner will extract schema of the JSON document and create a linked

Different "perspectives" for tables (Q2 2024)

Some tables hold multiple entities, or entities in different states. For instance, when order is 'APPROVED' it requires order date to be set.

Take the example of an orders table. An order marked as 'APPROVED' necessitates that an order date is specified. When analyzing such a table, it's beneficial to view the distribution and null values of the order_date column separately for both approved and draft orders.

Dataedo is introducing a feature that lets users create different "perspectives" on a single table. This helps us get a clearer picture of what data tables actually represent. These perspectives will also be a foundation for an upcoming Data Quality module.

Dashboards & Analytics (Q4 2023 - Q2 2024)

We are planning to deliver a number of built in dashboards:

  1. Metadata Ingestion Dashboard - Dashboard that helps track status, time and number of objects imported daily.
  2. Dataedo Usage Dashboard - Dashboard that shows usage of Dataedo - daily users, objects visited, searches and edits daily.
  3. Data Stewardship Dashboard - Dashboard for Data Stewards and DG management that shows documentation progress (broken down by different metadata types), weekly and monthly edits, and steward activity.
  4. Data Catalog Dashboard - Dashboard that shows summary and statistics of data assets in the catalog.
  5. Data Profiling Dashboard - High level overview of data profiled in the catalog and each data source.
  6. Data Quality Dashboard - See Data Quality section.

Data Quality (late 2024)

  1. Data Quality rules:
    • Simple column rules: value less/more/between, string length is less/more/between, column is not null, value fits pattern
    • Number of rows rules
    • Custom SQL rules enabling user writing custom tests using SQL
    • Foreign Key rules: DQ module will perform referential integrity tests from defined foreign keys/relationships in bulk
    • Unique Key rules: DQ module will perform uniqueness tests from defined primary/unique keys in bulk
    • Reference Data rules: DQ module will test if values match defined lookup in Reference Data module
  2. Thresholds - user will be able to provide values/thresholds for Pass/Warning/Error results.
  3. Filters - option to define filters for tables to test specific subset of rows (e.g. approved documents, transactions after 2020, etc.).
  4. Testing engine - Part of Dataedo DQ will be
  5. Tests log - Test logs will be saved in the DQ repository (open SQL database).
  6. Data Quality dashboard - presents high level overview of tests and quality over time with the ability to drill into specific tables, columns and tests.

Exports and Integrations

MS Teams and Slack integrations (sometime in 2024?)

We would like to make Dataedo more interactive, and on of the way will be integration with your communicator - MS Teams or Slack.

Dataedo API (sometime in 2024)

Dataedo API will enable you for easier integration. API will consist of RES APIs methods and will enable definition of webhooks to get notified on events and changes in Dataedo.

Excel export replacement (Q1 2024)

Excel export will be decommissioned and we will build in a number of grid views that will provide option to copy or save to Excel.

Deployment and administration

Repository on PostgreSQL (Q1-Q2 2024)

To make Dataedo hosting easier, we are working on repository hosted on PostgreSQL.

Desktop authentication with Web users (Q2-Q3 2024) and permissions in Desktop (late 2024)

Dataedo Desktop will enable connecting to repository using the same account as Web Catalog does. This will pave way to permissions included in Desktop.

Found issue with this article? Comment below
Comments are only visible when the visitor has consented to statistics cookies. To see and add comments please accept statistics cookies.
0
There are no comments. Click here to write the first comment.