Tuesday 14 November 2017

Anonymization: Analyze sensitive data without compromising privacy

When is data truly anonymized? You can probably remember several cases where organizations such as public transport organizations or telecommunication providers published insufficiently “anonymized” data sets resulting in very damaging highly visible news headlines.

This is not to do any finger-pointing, because you know what? Anonymization is really hard! For many real-life use cases it isn’t enough to just substitute names with pseudonyms, or mask some of the values. With a little additional background knowledge it is often possible to identify the individuals you thought had been anonymized.

With the new General Data Protection Regulation (GDPR) coming into force next year, organizations are increasingly looking for ways to reconcile modern data-centric business use cases with stringent privacy requirements. So how can organizations make sure they do the right thing, and show that they are taking their digital responsibility seriously?


SAP wants to support customers on their digital transformation journey and let them turn the privacy challenge into an opportunity. Our vision is to provide real-time anonymized access to data and by doing so make data available for uses cases previously prevented by data protection and privacy regulations.

SAP HANA Tutorials and Material, SAP HANA Certification, SAP HANA Guides, SAP HANA Live

Trial service for Data Anonymization – available NOW


The SAP HANA team has been putting a lot of thought and research into how to best help customers to safeguard data privacy, while unlocking the full potential of their data in modern analytic use cases. We are working on developing a new customizable functionality that will allow organizations to anonymize live data – by providing an anonymized view of live data in SAP HANA.

To provide early insights into SAP’s work on data anonymization, the SAP Data Anonymization trial service is released today. With SAP Data Anonymization, you can try out two state of the art advanced anonymization methods: differential privacy and k-anonymity. Your data is anonymized on the fly by this new web service, with no data being stored at any time on SAP servers.

State-of-the-art methods


Let me briefly explain in a bit more detail what differential privacy and k-anonymity are about. These methods come into play after obvious protection measures for direct identifiers have been applied, like pseudonymizing real names or masking social security numbers.

Differential privacy adds random noise to your data, for example to salary amounts in an employee survey. Looking at individual records, you won’t get any meaningful results and thus the privacy of individuals is protected. However the noise is added in such a statistically clever way that it allows you to still gain valid numerical insights when doing analytics on the whole data set.

SAP HANA Tutorials and Material, SAP HANA Certification, SAP HANA Guides, SAP HANA Live

k-anonymity hides individuals in groups by generalizing some of the values in the data set. Looking at census data, this could for example mean to not list actual birth dates, but only operate with year or decade ranges. Or looking at ZIP codes, this could mean generalizing according to hierarchies such as city or county. The number “k” specifies the minimum number of members in each of these groups in a data set.

For more information, watch this video

What can you do now that wasn’t possible before?


The examples above already hint at some potential use cases, but there are many more, for example

◉ Data as a service, where cloud providers could give access to anonymized user profile data for advertising purposes, or telecommunication providers give access to anonymized location data for city planning purposes.
◉ Telemetry and IoT, where car fleet managers could share anonymized car usage patterns with manufacturers, or energy suppliers could provide smart meter analytics based on anonymized usage data.
◉ Healthcare, where hospitals could make anonymized patient data available for researchers and insurers
◉ Archiving, where insurers could store anonymized historical data to be able to keep it even after the legal deletion periods

In the use cases above, anonymization is primarily applied to protect the privacy of individuals. But there is another whole dimension of use cases that are made possible by anonymization as well: analytics on business-confidential data. Businesses within a similar sector or peer-group could benchmark their performance against each other but without revealing detailed financial data or operational data.

You can probably think of some typical examples in your business area as well! 

And just one last closing remark: managing secure data access and configuring systems securely continue to be critical operational tasks – none of that goes away. Anonymization is a new tool in the toolbox, aimed primarily at doing analytics on whole sets of data that were previously denied. It complements other security mechanisms such as masking, authorization, and encryption. SAP HANA has security built into its core, with a comprehensive framework and tooling for authentication and single sign-on, authorization and role management, user and identity management, audit logging, secure configuration and encryption.

No comments:

Post a Comment