In a previous post (read here), we introduced the concept of Differential Privacy and its significance in safeguarding individual privacy among the vast amount of digital data available nowadays. In this post, we explore more intricate details of Differential Privacy and how we will make use of this mathematical marvel within the ENCRYPT project.
The Evolution of Privacy Protection
Privacy is not a static concept as it too evolves alongside technological advancements and changing societal norms. As data collection and analysis methods have become more sophisticated, so too must our approach to preserving privacy. This is especially important in the current era of data analytics where data are harvested to gain insights for research and innovation purposes. Considering this, Differential Privacy is an important tool that can be used, offering a mathematical foundation to protect privacy while still allowing for data utility. This means that we will be able to safeguard the privacy of data while still allowing for data analytic techniques to provide us with valuable data insights to take place.
Fine-Tuning Privacy with Differential Privacy
An important aspect of Differential Privacy is its adaptability. The privacy loss parameter ε, serves as a lever which allows users to fine-tune their desired level of privacy protection and achieve a delicate balance between privacy preservation and data utility. With a higher ε-value, lower amounts of randomness are added to data, allowing users to retain more accurate data for analysis, although admittedly this achieves lower levels of privacy protection. When lower ε-values are used, greater amounts of randomness are added to data, and in this way, privacy is prioritized at the expense of data accuracy and data utility. Finding the right ε-value to use for a dataset is thus very important, in order to achieve the right balance between data privacy and data utility.
The picture above demonstrates how clear data on the left-hand side is randomized but this still allows for data analytics to take place as can be seen on the right hand side of the picture.
Within ENCRYPT we aim to simplify the process of using privacy preserving technologies for users, and the ε -value is provided to users by the ENCRYPT Recommendation Engine. All users need to do is answer a very short survey providing us with the characteristics of their data set, required computation and use case. Specifically they provide us with input concerning the “Data Sensitivity”, “Data Size”, “Computational Intensity”, “Performance Constraints, “Time Constraints” and “Computational Constraints” of their problem space.
This innovative feature of the ENCRYPT Recommendation Engine greatly simplifies the process of using Differential Privacy and the right ε-security parameter to use – thus enabling non-experts to use Differential Privacy for their research.
Local vs Global Differential Privacy
There are two main Differential Privacy models in the literature and in practise. These are termed “Global Differential Privacy” and “Local Differential Privacy” and one way in which they differ is where randomness is added to a dataset.
Global Differential Privacy operates using a trusted centralized model, where a trusted curator adds randomness to a dataset. Although this centralized approach is simpler to deploy, for this to be implemented correctly and to be fully trusted, it is important to have strong security measures in place which can ensure that a dataset can be transmitted to this trusted curator in a secure and private manner. It is also important to ensure that the centralized repository is protected from potential breaches or adversarial attacks.
Depiction of the Global Differential Privacy model
Depiction of the Local Differential Privacy model
On the contrary, when using Local Differential Privacy, each data contributor perturbs (by adding randomness to a dataset) their own data locally on their computer, before transmitting it to a central location where data analytics will take place. This decentralized approach ensures that sensitive information remains safeguarded at its source, minimizing the risk of privacy breaches during data transmission or aggregation.
Within ENCRYPT we follow the model of Local Differential Privacy.
Beyond Noise and towards Data Insights
Even though Differential Privacy changes a dataset through the addition of noise, this data can still be used for data analytics purposes. Experiments carried out using ENCRYPT Use Case data, have shown that accuracy of Differential Privacy AI models – with low ε-values (and thus high amounts of randomness added to datasets), is quite comparable to models where no randomness was added to the same datasets. This accuracy is also very high at approximately 87% for Differential Privacy AI models, which is about 5% lower than the approximately 92% accuracy achieved for comparable models on the same datasets where no noise was added.
The above models have shown promise in our experiments, achieving comparable accuracy when differential privacy is used as opposed to when no randomness is added to datasets and comparable models are used.
Local Differential Privacy Data Insights using the ENCRYPT Platform
Within the ENCRYPT project, our aim is to provide the capability for users to run AI models upon an initial perturbed training-testing dataset. The addition of noise to datasets can be carried out through the ENCRYPT web interface and will be carried out locally on the user’s computational device before they upload the dataset to the ENCRYPT platform. The ENCRYPT Differential Privacy component will then be able to find the model which is most accurate for the provided training-testing dataset. This will then be saved for future use upon other experimental dataset the user may wish to analyse.
The steps to be carried out by users will thus be among the following:
This automation and ease of use provided by ENCRYPT is important in achieving our goal of encouraging more users to use privacy preserving technologies offered by the ENCRYPT platform.