SAN FRANCISCO – Profile data of 48 million users that was scraped from social networks and websites ranging from Facebook, LinkedIn, Zillow and Twitter was leaked by a private intelligence agency. The data was left on an Amazon S3 storage bucket accessible without a password by Localblox, the company that harvested the data.
“In the wake of the Facebook/Cambridge Analytica debacle, the importance of massive sets of psychographic data is becoming more and more apparent,” wrote UpGuard in a blog post released Wednesday on the leaky data discovery.
The data was found by Chris Vickery, director of cyber risk research at security firm UpGuard, in February. Localblox’s chief technology officer Ashfaq Rahman acknowledged the lapse and moved swiftly to secure the data.
Localblox bills itself as the “world’s most comprehensive cross device identity graph on businesses, consumers and geo audiences.”
Vickery, who Threatpost caught up with here at the RSA Conference, explained that the data was part of a single 1.2 TB ndjson (newline-delineated json) file.
“Part of what they did was take the Facebook search function and put in a special query related to phone numbers, email addresses and such and gathered a lot of information on people, just automatically like a bot,” he said.
Then Vickery said an analysis of the data revealed that the company would cross-reference Facebook data, such as an email address, with other scraped data to compile 48 million digital dossiers on internet users. “Facebook has recently disabled this type of query function, because it can be abused in such a way,” he said.
Threatpost caught up with Vickery who discussed his research into Localblox as well as other aspects of his research.