How does Differential Privacy ensure protection of personal data
by Konstantinos Kaltakis, Eight Bells Ltd.
The “right to privacy” is a human right that has been established in European Union, into the convention of human rights in 1950. The right to privacy defines that every EU citizen has the right to manage of his personal and his family lives, plus his home and telecommunication privacy.
Many tried to define of what is privacy, for example Warren and Brandeis define privacy as “the right to be alone”. Moreover, Westin refers to privacy as “Privacy is the right of individuals to control the collection, dissemination and usage of information about themselves”.
Privacy is something totally different from security. For example, you can be in a crowed place but still nobody identifies you, that is privacy.
In the digital world, the so-called Cyberspace, the amount of data that are processed are huge. With the bloom of Internet and social media, personal data are open to everyone. But what are the personal data?
Some categories of Personal data that the EU General Data Privacy Regulation (GDPR) defines are for example: Name and family name, home address, e-mail, IP address, and so on. Conversely, Sensitive data concern Race, Political / Religious believes, Genetic information and Medical records, Sexuality.
These data, if kept unprotected or publicly available without restrictions, can be used for fraudulent or harmful way for an individual. At the same time, they can be used for good cause, like statistics, health or for improvement of citizens’ quality of life. In the latter, it is required by law that data subjects are informed for the data collection, data processing, time of processing and the reason of processing.
This begs the question of what would be the way to use the data sharing for a good cause and without compromising privacy. There are a few answers on that provided by technology, one of them being the Differential Privacy.
Differential Privacy (DP) is a rigorous mathematical definition of privacy. DP helps researchers and analysts, to process datasets with personal data without disclosure (or at least makes it hard to) individual information. For that reason, DP is used to analyze a database and compute statistics, in a way that the output can not lead to any information about individuals, in the dataset. DP is not a specific technique, but rather a criterion of privacy protection. The basic principle of DP, is to inject randomized noise into the data analysis process. The difference from other privacy enhanced technologies (PETs), like generalization and masking or even other techniques that utilize noise injection, is that DP calibrates the noise to every query, balancing the trade-off between privacy and utility, maximizing data’s value, and that can be mathematically proven.
DP mathematically guarantees that the outcome of analysis will produce likely the same inference about an individual’s private information, whether or not that individual’s private information is combined in input for the analysis, when the same time DP can provide privacy protection against wide range of privacy attacks such as differencing attack, linkage attacks, etc. DP is a powerful privacy enhancing tool that assumes that all information is identifiable, but it cannot guarantee the data security, as it only protects private information and not general information. Moreover, on small datasets the noise injected must be enough to prevent data linkage, but that usually leads to data distortion.
DP introduces a privacy loss parameter (usually denote by ε), to the dataset. The ε defines how much noise is added to the dataset. The ε controls the privacy wanted, by adding or removing noise. High ε value means less noise and more accurate data, when low ε value means more privacy but less accurate data. Users can choose the amount of data perturbation, tuning the ε and customizing their privacy accordingly.
ENCRYPT, as part of the design, development and integration of a variety of PETs, will examine the applicability of DP. This will be achieved with utilisation of DP techniques, either to ENCRYPT input data or to AI training and inference algorithms, while the notion of “privacy budget” parameter (quantitatively dial up or down the privacy guarantees) will be also studied.