A vulnerability in a network that processes genomic data could pave the way to some global genetic databases being hacked, and open the door to some serious privacy issues.
Experts claim the problem lies in The Beacon Project, a network run by a coalition, Global Alliance for Genomics and Health, that parses genetic data. When users send pings to “beacons,” or servers, they’re essentially sending queries, asking whether different data sets have genomes with certain nucleotides, traits, and so forth.
If an attacker had access to a person’s genome however, they could ping collections of genomes to determine whether they’re part of a broader database. From there they could verify whether an individuals’ genome appears in a specialized group like a heart disease database, an autism database, or a cancer database, experts warn.
According to research published in a trade journal, The American Journal of Human Genetics, last week, an attacker could theoretically determine whether or not an individual was part of a “beacon” comprised of 1,000 individuals by only sending 5,000 queries.
Two researchers with the Stanford University School of Medicine, Suyash Shringarpure, PhD, postdoctoral research fellow in genetics, and Carlos Bustamante, professor of genetics, PhD, discussed their work in an article, “Privacy Risks from Genomic Data-Sharing Beacons,” on Thursday.
The ability of an attacker to exploit the weakness is narrow in scope, the researchers admit. An attacker would have to have an individual’s genome sequence, either from their saliva, or another genomic service, in addition to access to the Project’s infrastructure, neither which are easy to obtain – at least now – the researchers claim.
“As access to genomic data becomes easier, such attacks might need to be accounted for in the design of data-sharing mechanisms,” the two warn.
The researchers stress in their paper however that if an attacker did have a genome, they could ping enough beacons – to infer whether or not someone is affected by a certain condition or disease. The network is composed of 20 different organizations, covering over 250 genomic datasets currently.
The culprit here may not be a vulnerability in the network per se, but a systematic weakness in the way that anonymous-access beacons operate. In their research, Shringarpure and Bustamante insist the genetics community should to revamp the way it shares genomic data, and that should start with forbidding anonymous pings.
The anonymous beacons are inherently insecure, the researchers argue, and leave information that passes through them open to re-identification attacks, so individuals may be able to be matched with their genes.
“The most important step for improving security and reducing loss of privacy through beacons would be to prohibit anonymous access,” Shringapure and Bustamante write, adding that forcing users to authenticate themselves could curb such attacks.
“Requiring users to authenticate their identity to access beacons will allow the research community to discourage re-identification attacks through policies outlining acceptable uses of beacons,” the researchers claim.
Operators at the Toronto-based Global Alliance for Genomics and Health claim their techniques “adhere to the best practices outlined in its privacy and security policy” but they’ve already taken a few steps to address the vulnerability outlined by the researchers. One technique the coalition has tackled is “de-identifying” individual genomes so that names and other identifying information aren’t connected to each genome.
“We welcome the paper and are pleased to now have careful quantitative analysis of this particular risk scenario. We look forward to ongoing interactions with the authors and others to ensure beacons provide maximum value while respecting privacy,” said Peter Goodhand, the group’s executive director, in a press release on Thursday.
In addition to banning anonymous pings, the researchers also suggest that GAGH officials look into other options, like merging data sets to make it trickier to identify data sources, limiting the number of queries per IP address, and anonymizing data, but admit that the possibility to re-identifying samples isn’t going away anytime soon.
“We expect that, because of the lack of monitoring and access control, anonymous-access beacons will always be open to re-identification attempts,” Shringarpure and Bustamante write.