Some of you might be aware that on 25 May 2018 new important EU regulation enters into force - General Data Protection Regulation (GDPR). This is important not only for European organizations but to all that process personal data of European citizens.
GDPR aims at improving protection and management of personal data. To do it right, every organization should discover and document where personal data is stored in their databases.
Microsoft introduced a very interesting feature in its SSMS and SQL Server and Azure SQL databases - Data Discovery and Classification. It enables you to :
All functionality is available in new SQL Server Management Studio 17.5. It works only with SQL Server 2008 or newer and Azure databases. If you have an older version of SQL Server, please skip to "Do more" chapter where I talk about Dataedo tool.
Connect to your server and right-click a database, choose Tasks and Classify Data....
Data classification window will open with a list of suggested columns and their classification.
There are two fields used for classification:
SSMS has hardcoded a list of keywords that are potentially used in columns holding personal data. Pretty simple and powerful.
I created a sample table with a list of columns holding potentially sensitive data. Here's how SSMS labeled the information:
I marked my reservations, some of those fields hold sensitive data as defined in GDPR and were not labeled.
I also didn't find any field labeled as highly confidential, and those are ethnicity, religion, political views, trade union membership health and biometric information.
The conclusion is that this is just a supporting tool, but you should review each column yourself.
All information is stored in extended properties, key being:
This was it for the features in SSMS, SQL Server, and Azure and now I'd like to show you how you can share this information in convenient HTML, easily add more descriptive fields and do the same exercise for other engines. You can achieve this with an easy tool - Dataedo.
HTML export of your schema with extended fields looks like this:
Dataedo enables you to import the schema from other engines, like MySQL or Oracle. I'm not aware features similar to the ones in SSMS for those databases, so Dataedo will bridge the gap there. You can import multiple databases into one repository.