Rule vs instance
Our data quality module includes over 80 ready-to-use rules that you can assign to columns in tables from supported connectors.
A rule is a guideline that defines what "good" or acceptable data should look like. For example, one of our predefined rules, "allowed values," checks if column values match a list you provide.
An instance refers to a rule applied to a specific column in your dataset.
You'll see these terms in the Portal: you choose a rule, assign it to a column, and then view the results for that instance.
A single column can have multiple rules, but each must be assigned separately since different parameters need to be defined for each one. You can also edit or remove rules individually.
Create instance entry points
You can create an instance in the Portal using three different methods:
- From a column:
- Search for the column you want to assign rules to.
- Go to its Data Quality tab.
- Click the Assign rule button.
- A window will open where you can select an available rule.
- From a table:
- Find the table you're interested in.
- Go to its** Data Quality** tab.
- Click the Assign rule button.
- First, you'll need to choose a column from the table. After that, you can select a rule.
- From the Data Quality section:
- Navigate to the Data Quality section in the main menu.
- Go to the Rule instances tab.
- Click the Create rule instance button.
- A popup will guide you to select a data source, table, and column before choosing a rule.
Note: Make sure your connector is supported. If it isn’t, the Data Quality tab will not appear for columns or tables.
Step 1: Select a rule
After selecting a column, you'll need to choose a rule. Each rule has:
- A name and description to explain what it checks.
- A library it belongs to. In the future, you'll be able to create your own library with custom rules.
- Applicable column types:
- All: Can be assigned to any column.
- Text: For string-type columns.
- Date: For date-type columns.
If there are any warnings, like a rule not working with an older connector version or missing native support, we'll show them here.
Step 2: Parameters and filters
Some rules need additional parameters to work, while others don’t. For example:
- "Not null": No extra parameters are needed. It simply checks if the selected column contains any null values.
- "Allowed values": Requires you to provide a list of valid values. The rule will then check if the column data matches this list.
- "Value range": Needs a minimum and maximum value to define what counts as correct. Data outside this range will be flagged.
After entering the required parameters, you’ll see an optional field called Filter. This is useful when you don’t want to check all records in an instance.
For example you might skip verifying email correctness for records created before 2015, when your company started email validation. Or, you might only want to check invoices marked as high priority.
In these cases, you can create a query to filter the records you want to include.
The filter syntax may vary depending on the connector you're using. For example:
- SQL Server: Use syntax like [priority] = 1.
- MySQL: Use syntax like `priority` = 1.
Be sure to follow the correct syntax for your connector.
At this step, you can preview the queries used to check the column's data. On the right side, there's a toggle you can enable to view two types of queries:
- Raw Rule Query: This shows the rule's definition with parameter placeholders.
- Instance Rule Query: This shows the query with the table and column names you selected, along with the parameters and filter you applied.
Step 3: Failed rows
By default, we only collect numeric statistics about your data quality. This means you'll see how many rows were tested and how many passed or failed.
In the third step of creating a rule instance, you can choose to save the failed rows to make it easier to find and fix them.
If you enable this option, we’ll save the first 1,000 failed rows. The only required field is ID column(s). This field must include columns that uniquely identify each row in the table. By default, we prefill this with the Primary Key of the selected table. If there's no Primary Key, specify one or more columns that will act as unique identifiers.
The second field is Additional columns. If you want to save other columns to help find the failed rows, you can list them here.
Lastly, you can sort the results by one or more specific columns.
Step 4: Settings
The final step in the creation process is setting up the instance. You can choose the instance's state:
- Active: The rule will run during every scheduled Data Quality check.
- Draft: The rule will be created but won’t run until it’s set to active. You can change the state at any time by editing the instance.
Severity defines how important the instance is. For example, you could schedule critical rules to run daily, and lower-severity rules to run weekly.
Instance description is useful for noting details, like when a filter is applied. This helps business users understand that the rule is checking only a specific set of data.