GDPR in SQL Server and Azure SQL Database

Some of you might be aware that on 25 May 2018 new important EU regulation enters into force - General Data Protection Regulation (GDPR). This is important not only for European organizations but to all that process personal data of European citizens.

GDPR requirements

GDPR aims at improving protection and management of personal data. To do it right, every organization should discover and document where personal data is stored in their databases.

New feature in SQL Server Management Studio: Data Discovery and Classification

Microsoft introduced a very interesting feature in its SSMS and SQL Server and Azure SQL databases - Data Discovery and Classification. It enables you to :

  • Discover sensitive data in your database with recommendations - SSMS scans for columns that contain potentially sensitive information,
  • Classify and label columns as holding sensitive data,
  • Report classification and sensitive data held in your databases.

Discovering / Recommendations

All functionality is available in new SQL Server Management Studio 17.5. It works only with SQL Server 2008 or newer and Azure databases. If you have an older version of SQL Server, please skip to "Do more" chapter where I talk about Dataedo tool.

Connect to your server and right-click a database, choose Tasks and Classify Data....

Data classification window will open with a list of suggested columns and their classification.

There are two fields used for classification:

Information Type:

  • Banking
  • Contact Info
  • Credentials
  • Credit Card
  • Date Of Birth
  • Financial
  • Health
  • Name
  • National ID
  • Networking
  • SSN
  • Other

Sensitivity Label:

  • Public
  • General
  • Confidential
  • Confidential - GDPR
  • Highly Confidential
  • Highly Confidential - GDPR

Labeling sensitive data

How suggestions work?

SSMS has hardcoded a list of keywords that are potentially used in columns holding personal data. Pretty simple and powerful.

A test

I created a sample table with a list of columns holding potentially sensitive data. Here's how SSMS labeled the information:

I marked my reservations, some of those fields hold sensitive data as defined in GDPR and were not labeled.

I also didn't find any field labeled as highly confidential, and those are ethnicity, religion, political views, trade union membership health and biometric information.

The conclusion is that this is just a supporting tool, but you should review each column yourself.

Where is this information stored?

All information is stored in extended properties, key being:

  • sys_information_type_name for Information Type
  • sys_sensitivity_label_name for Sensitivity Label

Reporting fields

Auditing

Do more

This was it for the features in SSMS, SQL Server, and Azure and now I'd like to show you how you can share this information in convenient HTML, easily add more descriptive fields and do the same exercise for other engines. You can achieve this with an easy tool - Dataedo.

HTML export of your schema with extended fields looks like this:

  • sys_information_type_name
  • sys_sensitivity_label_name

Do it for other engines too

Dataedo enables you to import the schema from other engines, like MySQL or Oracle. I'm not aware features similar to the ones in SSMS for those databases, so Dataedo will bridge the gap there. You can import multiple databases into one repository.

Tutorial: Browse, search and publish Data Classification from SQL Server