We've just released Dataedo version 25.1.0! Take a look at this release's highlights:
Data Quality
Custom SQL rule
In version 24.4, we introduced a Data Quality module with predefined rules, allowing you to assign over 80 rules to your columns. In this version, we're adding the option to create custom rules using SQL queries. This gives you the flexibility to test anything you need and design rules tailored to your specific requirements. Read on to learn how to grant permissions for creating custom SQL rules and how to set them up.
Custom SQL rule - permissions
Custom rules allow direct querying of your data. To ensure security and restrict access to authorized users, we’ve introduced a new role: Power Data Steward. This role is required to create and manage custom SQL queries. It includes all Data Steward capabilities, plus the additional Manage External Queries
role action.
Custom SQL rule - creation flow
The entry point is the same as for predefined rules. Click the CTA button to create a rule instance, either in Data Governance → Data Quality → Rule Instances or directly on a table or column.
If you have the required Power Data Steward role, a popup will appear, prompting you to choose between a predefined rule from the rule list or a Custom SQL Rule. This time, select Custom SQL Rule.
The first step in creating a custom rule is defining the database and entering the SQL query.
Start by selecting the data source where the SQL rule will be applied. A dropdown menu will display the available data sources, restricted to those permitted for Data Quality execution.
Next, define the Custom SQL Rule by entering your SQL query in the provided text area. This input field allows you to enter a custom SQL query that defines the condition to be checked by the rule. The query you provide should return the rows that do not meet the specified data quality criteria, marking them as incorrect. The rule will be applied to the data filtered by the WHERE clause of your query. Any rows that are returned by the query will be flagged as failed during the rule execution, helping you identify and address data quality issues effectively.
Optionally, you can specify a Tested Rows Count Definition by entering a SQL query to count the rows that will be tested by the custom rule. This step helps in understanding the scope of the rule but is not required. This input field allows you to enter an optional SQL query to specify the number of rows your custom SQL rule will evaluate. The query should return the count of rows that meet the criteria defined in the WHERE clause of your custom SQL query. If this field is left blank, the rule will not include the number of rows in the results. This feature provides additional control over the scope of your rule evaluation.
Both fields include syntax highlighting for better readability and allows you to write complex queries without automatic validation, giving you full control over the SQL logic.
Next, select the Object Type to specify whether the rule applies to a column or a table.
If you select a column, suggested columns will be displayed along with input fields for both the table and column names. If you select a table, suggested tables will be shown with an input field to specify the table name.
Our parser analyzes the query you provided in the first step, identifying and matching existing tables and columns in the selected data source. In the Suggested objects based on your query
section, you'll see these matched objects as recommendations for linking your rule to the appropriate table or column.
A search input allows you to quickly find the desired object, streamlining the selection process.
We've added the ability to save failed rows for custom SQL rules, giving you more control over your data quality checks.
By default, failed rows are not saved. To enable this feature, simply activate the Save Failed Rows option. This allows you to store rows that don’t meet your custom SQL rule criteria for further analysis.
The final step in creating a rule instance is similar to selecting one from the predefined rule list. The key difference is that you need to provide a name and an optional description for your custom rule.
Additionally, you’ll need to assign the rule to a Library, which helps you organize and group your rules based on your preferences.
Treat null rows as fail or success
Until now, we treated null values in Data Quality as failing rows. Moving forward, each instance (during creation and editing) will have the option to specify whether null values should be treated as failed or successful rows.
Instances created before this change will continue treating null values as failed by default. However, this setting can be updated by accessing the instance's details and editing it.
Table rules
In this version, you can now assign rules not only to columns but also to tables. When assigning a rule, you can choose whether it applies to a column or a table.
Once you select a specific table, you can choose a rule from the available table-specific rules, which differ from the rules available for columns. The rest of the process remains unchanged.
Run history on instance details
Run history is now displayed for each Data Quality Rule Instance. To view this information, navigate to a specific instance and click on its Run History
tab.
Improvements
- Dimension included on grids and instance's details,
- Additional filters on grids added,
- Data Quality score is visible on instances grid on objects,
- Sparkline with Data Quality Score and Ratio added,
- Data Quality tab and Data Quality Ratio added on Data Sources and Domains,
- Setting instance to active or draft is available on instance's page in "three dot" menu, instead of changing it via instance editing.
- Bugfixes, typos and visual improvements.
Azure OpenAI supported
We’ve expanded our AI autodocumentation feature to support Azure OpenAI in addition to OpenAI. To add an Azure OpenAI engine, go to System Settings, select LLM (AI) Engines, choose the AzureOpenAI platform, and provide the required parameters. Using AI autodocumentation works the same way as with OpenAI.
New connectors
OpenLineage connector
OpenLineage is an open standard for tracking data lineage across processing systems. It standardizes the metadata collection about data pipelines, enabling better visibility, debugging, and governance of data workflows.
Dataedo's OpenLineage integration introduces a new capability to automatically capture and visualize data lineage from modern data processing systems. This connector allows to track pipeline metadata through OpenLineage-compatible tools.
Some of the tools that support emitting OpenLineage:
- Apache Spark
- Apache Spark on Databricks
- Apache Spark on AWS Glue
- Apache Airflow
- Apache Flink
- dbt
- Great Expectations
- Trino
Read more details here.
Coalesce connector
Coalesce is a data transformation platform designed to streamline and automate the ELT (Extract, Load, Transform) workflow. Dataedo's Coalesce integration allows you to document Snowflake column-level data lineage created by Coalesce.
Dataedo imports Coalesce nodes as Graph Nodes. In lineage, the Node acts as a processor, connecting the objects in Snowflake.
Read more details here.
Hevo pipelines connector
Hevo is a fully managed data integration platform that simplifies the process of collecting, transforming, and loading data from various sources into a centralized system. It streamlines the ELT (Extract, Load, Transform) workflow, enabling businesses to automate data pipelines without requiring extensive coding or manual intervention.
Dataedo's Hevo integration allows you to document Hevo pipelines, transformations, and schema mappings. Dataedo also automatically creates some of the data lineages.
Read more details here.
Oracle PeopleSoft connector
Oracle PeopleSoft is a suite of enterprise applications designed to streamline business operations across HR, finance, supply chain, and more. It provides organizations with robust tools for managing workforce data, financial processes, and operational efficiency.
Dataedo's Oracle PeopleSoft integration allows you to document all physical Oracle PeopleSoft objects but also the logical layer.
Read more details here.
Connectors improvements
- SQL Server
- Tableau
- Azure Purview
- Qlik Sense On-Premises
- MySQL parser
- Import process speed improvements
SQL Server
We added support for synonyms in the SQL Server connector. Synonyms are now imported as objects and have automatic data lineage to the base object.
Tableau
From now on, the Tableau connector will support the reports usage feature. Dataedo import reports unique views, views count, and last view date.
Azure Purview
We added the following improvements to the Azure Purview connector:
- Option to fold two objects into one like view and view script.
- Automatic data lineage between different Azure Purview imports.
- Option to import classification.
- Option to import business metadata/attributes.
Qlik Sense On-Premises
Users will be able to filter objects based on the stream to which they belong, making it easier to organize and locate relevant content.
MySQL parser
We improved the MySQL parser to support automatic data lineage based on stored procedures and functions. Now, when a stored procedure or function in MySQL dialect (like MySQL or MariaDB) uses statements like INSERT
or UPDATE
to modify data in a table, Dataedo will automatically create data lineage.
Read more details here.
Import process speed improvements
We have made significant improvements to the import process speed. Those improvements were made in scripts that are responsible for loading metadata into the Dataedo repository.
Dashboards improvements
In version 25.1, we’re introducing several improvements to the existing dashboards:
- Links to data sources, databases, reports, and other repository objects
- Data bars for columns with numeric data
- Avatars and more informative tooltips in dashboards that display repository users
Deleting Domains from Dataedo Portal
In previous versions, deleting domains and areas was only possible in the Dataedo Desktop. Our goal is to enhance the Dataedo Portal by adding missing functionalities that were previously available only in the desktop version. From now on, after navigating to a selected domain or area, you can click on the menu under the three dots next to the name and delete a specific domain.
Agent installer improvements
The Agent is now installed as a standalone .exe file and includes a portable version of the app. The Desktop ZIP file downloaded from https://dataedo.com/download for the Dataedo Desktop portable version now includes the app files directly, removing the extra Dataedo
folder. Previously, the ZIP file contained a directory called Dataedo
, which stored the app files.
Multiple agents can be installed, each requiring a unique encrypted connection string. Users can now specify the installation path during setup for added flexibility.
Steward Hub removed from Dataedo Desktop
With the release of version 25.1.0, we are removing Steward Hub from Desktop. Moving forward, we’ll focus on introducing new types of suggestions to Steward Hub in the Portal.
UX/UI changes
With each major version, we enhance our UI to make it more modern, consistent, and user-friendly. Here are some of the improvements in this release.
Community redesign
We’ve redesigned our community! You can now access it from the right sidebar and next to the object's name.
Comments are now organized into tabs with counters, making it easy to see if there are any comments of a specific type.
Upper nav redesign
We’ve refined the upper navigation bar by moving notifications there, adding a logout button, and displaying information about the logged-in user.
Notifications redesign
Notifications have been moved from the right sidebar to the upper navbar. A red dot indicates unread notifications.
The notifications list has also been refined for better consistency with community comments.
Quick and advanced search redesign
Quick search is now available when you click the search input in the upper navbar.
As you type, you'll see up to three results per category (e.g., terms, domains, reports, columns, etc.). We also highlight the searched phrase to make it easier to scan through the results.
You can navigate the results using the arrow keys and open a selected object by pressing Enter—or simply click with your mouse.
After typing a search phrase, you can press Enter to navigate to advanced search, where you'll see more objects with additional details. Next to the search input, there's a filter button—clicking it opens a filters pane to help narrow down your results, just like before. Applied filters are now displayed as badges under the search bar, and you can remove any filter by clicking the "x".
Workflows redesign
We've taken another step in our Portal redesign journey by updating how workflows are displayed. The workflow status is now more compact and aligned with our recent UI changes.
For editors, updating a workflow status now opens a popup with available statuses, replacing the previous dropdown menu.
Users with view permissions will not see the edit button.
Tooltip improvements
In this version, we’ve improved the tooltips’ UI and behavior. They are now better positioned and align more seamlessly with our ongoing Portal redesign. Small tooltips are dark-colored to match the new navigation, while larger tooltips with descriptions and details are gray for better readability.
URL Custom Field
We added a new Custom Field of type URL that accepts links and opens them in a new tab when clicked.
Dataedo on Docker - more accurate version tagging
From now on we will name major versions with .0
at the end so that it is easier for Docker users to upgrade the repository. The docker image version number is 25.1.0
for this version.