Some of you might be aware that on 25 May 2018 new important EU regulation enters into force - General Data Protection Regulation (GDPR). This is important not only for European organizations but to all that process personal data of European citizens.
GDPR requirements
GDPR aims at improving protection and management of personal data. To do it right, every organization should discover and document where personal data is stored in their databases.
New feature in SQL Server Management Studio: Data Discovery and Classification
Microsoft introduced a very interesting feature in its SSMS and SQL Server and Azure SQL databases - Data Discovery and Classification. It enables you to :
- Discover sensitive data in your database with recommendations - SSMS scans for columns that contain potentially sensitive information,
- Classify and label columns as holding sensitive data,
- Report classification and sensitive data held in your databases.
Discovering / Recommendations
All functionality is available in new SQL Server Management Studio 17.5. It works only with SQL Server 2008 or newer and Azure databases. If you have an older version of SQL Server, please skip to "Do more" chapter where I talk about Dataedo tool.
Connect to your server and right-click a database, choose Tasks and Classify Data....
Data classification window will open with a list of suggested columns and their classification.
There are two fields used for classification:
Information Type:
- Banking
- Contact Info
- Credentials
- Credit Card
- Date Of Birth
- Financial
- Health
- Name
- National ID
- Networking
- SSN
- Other
Sensitivity Label:
- Public
- General
- Confidential
- Confidential - GDPR
- Highly Confidential
- Highly Confidential - GDPR
Labeling sensitive data
How suggestions work?
SSMS has hardcoded a list of keywords that are potentially used in columns holding personal data. Pretty simple and powerful.
A test
I created a sample table with a list of columns holding potentially sensitive data. Here's how SSMS labeled the information:
I marked my reservations, some of those fields hold sensitive data as defined in GDPR and were not labeled.
I also didn't find any field labeled as highly confidential, and those are ethnicity, religion, political views, trade union membership health and biometric information.
The conclusion is that this is just a supporting tool, but you should review each column yourself.
Where is this information stored?
All information is stored in extended properties, key being:
- sys_information_type_name for Information Type
- sys_sensitivity_label_name for Sensitivity Label
Reporting fields
Auditing
Do more
This was it for the features in SSMS, SQL Server, and Azure and now I'd like to show you how you can share this information in convenient HTML, easily add more descriptive fields and do the same exercise for other engines. You can achieve this with an easy tool - Dataedo.
HTML export of your schema with extended fields looks like this:
- sys_information_type_name
- sys_sensitivity_label_name
Do it for other engines too
Dataedo enables you to import the schema from other engines, like MySQL or Oracle. I'm not aware features similar to the ones in SSMS for those databases, so Dataedo will bridge the gap there. You can import multiple databases into one repository.
Tutorial: Browse, search and publish Data Classification from SQL Server