The term metadata and the implications of its collection and analysis have been one of the key points in the debate surrounding the NSA’s broad surveillance programs over the last year. Legislators, policy makers and others continue to argue about whether metadata can actually reveal anything about the people behind the phone numbers, but researchers who have studied a new data set say there should be no doubt: metadata is sensitive information.
Researchers at Stanford University’s Security Lab and Society last fall spun up a new program called MetaPhone designed to gather metadata from volunteers’ Android phones and then analyze the data to see what conclusions they could draw. The project’s 546 participants called more than 33,000 unique numbers during the study period, and the Stanford researchers were able to infer highly sensitive information about some of the volunteers, including serious medical conditions, gun ownership and other data.
“At the outset of this study, we shared the same hypothesis as our computer science colleagues—we thought phone metadata could be very sensitive. We did not anticipate finding much evidence one way or the other, however, since the MetaPhone participant population is small and participants only provide a few months of phone activity on average,” Jonathan Mayer of Stanford wrote in a post revealing some of the results of the MetaPhone project.
“We were wrong. We found that phone metadata is unambiguously sensitive, even in a small population and over a short time window.”
By using the data collected from their volunteers’ phones, along with information from public sources such as Google Places and Yelp to help identify the callers’ contacts, the Stanford researchers were able to discover that their volunteers were calling a large variety of businesses that could be considered sensitive. Doctors’ offices, medical device companies, churches, gun shops and even marijuana dispensaries popped up on the list. Some people also called alcohol rehabilitation programs and family planning clinics.
“The degree of sensitivity among contacts took us aback. Participants had calls with Alcoholics Anonymous, gun stores, NARAL Pro-Choice, labor unions, divorce lawyers, sexually transmitted disease clinics, a Canadian import pharmacy, strip clubs, and much more. This was not a hypothetical parade of horribles. These were simple inferences, about real phone users, that could trivially be made on a large scale,” Mayer said.
The conclusion, Mayer said, is clear: Metadata can reveal sensitive information. NSA officials, lawmakers and even President Obama have maintained that metadata does not constitute sensitive information because it doesn’t include the content of calls. Metadata, in general, includes the originating and terminating numbers of a call as well as the length of the call.
“The dataset that we analyzed in this report spanned hundreds of users over several months. Phone records held by the NSA and telecoms span millions of Americans over multiple years. Reasonable minds can disagree about the policy and legal constraints that should be imposed on those databases. The science, however, is clear: phone metadata is highly sensitive,” Mayer wrote.
Image from Flickr photos of Mathias Ripp.