Google is working on a new system that enables the company to collect randomized information about the way that users are affected by unwanted software on their machines, without gathering identifying data about the users.
The system is known as RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response) and Google currently is testing it in Chrome. The company’s engineers are hoping to use RAPPOR to aggregate data on the problems affecting users while still preserving the privacy of each individual.
“To understand RAPPOR, consider the following example. Let’s say you wanted to count how many of your online friends were dogs, while respecting the maxim that, on the Internet, nobody should know you’re a dog. To do this, you could ask each friend to answer the question ‘Are you a dog?’ in the following way. Each friend should flip a coin in secret, and answer the question truthfully if the coin came up heads; but, if the coin came up tails, that friend should always say ‘Yes’ regardless,” Úlfar Erlingsson, tech lead manager in security research at Google, wrote in a blog post explaining the new system.
“Then you could get a good estimate of the true count from the greater-than-half fraction of your friends that answered “Yes”. However, you still wouldn’t know which of your friends was a dog: each answer ‘Yes’ would most likely be due to that friend’s coin flip coming up tails.”
Software vendors routinely collect data from users’ machines, typically in the form of crash reports or telemetry from thing such as security products or browsers. Users typically need to opt into sending that kind of information, and there are some privacy concerns around sending it. Google’s system is designed to address some of these issues.
“In short, RAPPORs allow the forest of client data to be studied, without permitting the possibility of looking at individual trees. By applying randomized response in a novel manner, RAPPOR provides the mechanisms for such collection as well as for efficient, high-utility analysis of the collected data. In particular, RAPPOR permits statistics to be collected on the population of client-side strings with strong privacy guarantees for each client, and without linkability of their reports,” the Google authors wrote in an abstract for a paper submitted to the ACM Conference on Computer and Communications Security.
Google has made RAPPOR available on GitHub as an open source project.
“Building on the concept of randomized response, RAPPOR enables learning statistics about the behavior of users’ software while guaranteeing client privacy. The guarantees of differential privacy, which are widely accepted as being the strongest form of privacy, have almost never been used in practice despite intense research in academia. RAPPOR introduces a practical method to achieve those guarantees,” Erlingsson wrote.