A team of computer science engineers from Princeton have released a paper that explains how an adversary with a passive presence on a network or Internet backbone could track individuals by observing HTTP cookies.
The motivation for the project was news in December that the National Security Agency had the capability to access Google’s PREF cookies to conduct surveillance on individual targets. PREF cookies are preferences cookies that websites reference to learn a user’s preferred language for localization purposes and other personalization features.
Since much isn’t known in detail about how the NSA gathers PREF cookies, the Princeton team decided to take more of a high-level approach with their experiment in order to connect the dots between the cookies that are dropped on a user’s machine as they surf the Web in order to establish their real-world identity.
Assuming an adversary, whether a criminal or intelligence agency, has a presence on the network, the working premise here is that the first- and third-party cookies dropped by sites and advertisers can be used to tie a user to web traffic without having to worry about dynamic IP addresses,” said the paper, “Cookies that give you away: Evaluating the surveillance implications of web tacking,” written by Dillon Reisman, Steven Englehardt, Christian Eubank, Peter Zimmerman, and Arvind Narayanan. Also, HTTPS doesn’t seem to be an issue in this case because, the paper said, many websites where users are logged in may already reveal their identity in plain text.
“Thus, an adversary that can wiretap the network can not only cluster together the web pages visited by a user, but can then attach real-world identities to those clusters. This technique relies on nothing other than the network traffic itself for identifying targets,” the paper said. “Even if a user’s identity isn’t leaked in plaintext, if the adversary in question has subpoena power they could compel the disclosure of an identity corresponding to a cookie, or vice versa.”
The paper illustrates the researchers’ theory. The attacker passively monitors a user’s web traffic. Each time a user lands on a webpage, cookies are dropped, but the adversary is unable to begin connecting those dots until there are more than two sites visited.
“The unique cookie from X connects A and C while the one from Y connects B and C. We assume here that the user has visited pages with both trackers before so that cookies have already been set in her browser and will be sent with each request.”
The experiment modeled user behavior online, a supposition that a user visits up to 300 websites during a two-three month period, and looks for components that will connect users to their identity. The paper said that 90 percent of visits are able to be clustered in this way.
“It applies even if the adversary is able to observe only a small, random subset of the user’s requests,” the paper said. “We find that on average, over two-thirds of time, a web page visited by a user has third-party trackers.”
The researchers also learned that 60 percent of the top 50 Alexa websites transmit identifying information in plaintext, such as a user’s name or email address, once a user is logged in, greatly enhancing the experiment’s chances of success.
An attacker interested in monitoring the web activities of a target or set of targets can scan for identity information in the plaintext HTTP traffic or target the cookie ID from a first-party page, the paper said. The researchers said this starting point enables the attacker to “transitively” connect the first-party cookie to other first- and third-party cookies to tie an identity to a cluster of traffic.
“We hope that these findings will inform the policy debate on both surveillance and the web tracking ecosystem,” the paper said. “We also hope that it will raise awareness of privacy breaches via subtle inference techniques.”