Measure the anonymity of our data on the web with revolutionary software

- EN - NL- FR
 (Image: Pixabay CC0)
(Image: Pixabay CC0)

Anonymity is essential to protect freedom of expression and digital rights in our democracies. It is based on the absence of identification, surveillance or traceability of individuals. However, with advances in artificial intelligence, guaranteeing this anonymity is becoming increasingly difficult. Julien Hendrickx, professor at UCLouvain’s Ecole polytechnique, Yves-Alexandre de Montjoye, UCLouvain engineer and associate professor at Imperial College London, and Luc Rocher, former UCLouvain professor at Oxford University, have developed a new mathematical model to better understand the risks posed by AI and help regulators protect individual privacy. The results of this study are published in the prestigious scientific journal Nature Communications.

In previous research (2019), these scientists had already succeeded in demonstrating how easy it is to re-identify supposedly anonymized people on the web, based on a few partial pieces of information (age, zip code, gender). This work revealed the extent of the risks involved in disseminating sensitive data, even after anonymization.

In this new study, the researchers propose an innovative model (called the Pitman-Yor correction model (PYC)) that evaluates the performance of large-scale identification techniques, in different application and behavioral contexts. Julien Hendrickx , co-author and professor at UCLouvain, explains: " Our new tool relies on Bayesian statistics to learn how similar individuals are, and extrapolate the accuracy of identification to larger populations, with a performance up to 10 times better than previous rules. This work provides, for the first time, a robust scientific framework for evaluating identification techniques, for large-scale data. "

The aim of this research - To help better understand the risks posed by AI and enable regulators to better protect people’s privacy. Although regulations such as the RGPD strictly frame the use and sharing of personal data, anonymized data escapes these restrictions. It was therefore essential to determine whether data is truly anonymous or can be re-identified, in order to contribute to respect for privacy.

Examples - In medical studies, the tool, developed at UCLouvain, can help determine whether information on patients could be used to trace their identity, and thus help prevent such identification. In everyday life, the new tool also makes it possible to monitor (and thus counter) the accuracy of advertising codes and invisible trackers that identify users online based on small details such as time zone or browser settings, a technique known as "device fingerprinting".

Finally, the scientists explain how this method could help organizations strike a better balance between the benefits of AI technologies and the need to protect individuals’ personal data, making everyday interactions with technology safer and more secure. " We believe this work is a crucial step towards developing rigorous methods for assessing the risks posed by increasingly advanced AI techniques and the nature of human trace identification online. We hope that this work will be of great help to scientists, data protection officers, members of ethics committees and other practitioners seeking to balance data sharing for research and the protection of citizens’ privacy. "