Table of Contents:
Automated Data Discovery and Classification
This feature is available in from version 7.5 in Enterprise plan.
Predefined classification functions
Dataedo is shipped with built-in predefined functions. Those functions are (and will be growing):
- PII - finds columns PII (Personally Identifiable Information)
- GDPR - finds columns potentially holing personal data as defined by European GDPR
How it works
Dataedo scans data catalog and in search of table columns that:
- name or title fits one of the predefined masks, e.g. '%date%birth%'
- have names or titles the same as columns with assigned classifications
- are linked to classified business glossary term
Run Data Discovery and Classification
Click Data Classification button from the ribbon and choose one of the built in classification functions.
This will run column search and open window with the results and classification suggestions.
The grid contains list of all table columns from the entire catalog that match conditions grouped by column name.
Right side of grid presents current (Current value column) and suggested (Update to column) values of classification labels.
Each classification function has its set of classification fields but usually they are:
- Classification - Sensitivity level such as "Sensitive", "Non-sensitive"
- Domain - Type/domain of data held in the column. Types include: name, email, phone number, date of birth, address, etc.
The list contains all the columns found that meet conditions defined above. Your task is to review them before you save classifications. It is advised to do so in packages. First remove selection with Select none option. Then go by column names and review labels.
If labels are correct then select a block with left mouse button or keyboard arrows with Shift key, right click and chose Check selected to include them when you save.
If you'd like to change sensitivity level or domain just change value in the Update to field from dropdown list. If you'd like to add custom labels type in new value with keyboard and confirm with Enter.
To tag column as non-classified/non-sensitive choose one of predefined labels or add your own.
After you have reviewed classification labels and selected (with a checkbox) columns to classify you need to save them in the repository. To do it click Save button.
You can browse classifications in catalog editor or HTML exports. Classification labels are visible next to table columns as any other custom field.
You can curate labels manually directly in the catalog editor and classify columns that were not found by the discovery function in the same way as in case of manual classification.
Running multiple times
You can run function multiple times. You need to do so if you review suggestions in packages.
Next run of classification is able to find new fields because it saved classifications propagate to the similar columns (having same name or title). It is advised to run classification as long as it returns any results.
Tuning classification search masks
Advanced users can add tune classification masks and add new domains (searched data types) directly in the repository database. Column name masks are defined in classificator_masks table, adding new data domain and suggested classification label is done in classificator_rules table.
More information in the repository database documentation
Custom classification functions
If you mastered tuning existing classifications you can easily add new custom functions. To add new function add new row in classificators table and define its logic in classificator_masks and classificator_rules tables the same way as you did in the previous point.