Facebook Data of Millions Exposed in Leaky Datasets


Researchers say that two publicly exposed dataset are leaking Facebook data- from user names to plaintext passwords.


Hundreds of millions of Facebook records – including account names, personal data, and more – have been found in two separate publicly-exposed app datasets.

The first publicly-exposed dataset originates from a Mexico-based media company, Cultura Colectiva, and contains over 540 million records including comments, likes, reactions, account names and more. The second publicly-exposed backup, a Facebook-integrated app titled At the Pool, exposed plaintext app passwords for 22,000 users and other data. Both exposed databases have been secured, researchers said.

“Facebook’s policies prohibit storing Facebook information in a public database,” a Facebook spokesperson told Threatpost. “Once alerted to the issue, we worked with Amazon to take down the databases. We are committed to working with the developers on our platform to protect people’s data.”

The Facebook spokesperson told Threatpost that the incident is currently under investigation, and at this point is it unclear how long the records were up and whether the data was misused.

The scope of publicly-exposed datasets are similar to that of an incident with Facebook and Cambridge Analytica, which took place March 2018. However in that case, data was harvested by app developers, as opposed to being accidentally exposed.

Both exposed datasets were from app developers on Facebook who collected data from people who used  their apps through the platform.

“These two situations speak to the inherent problem of mass information collection: the data doesn’t naturally go away, and a derelict storage location may or may not be given the attention it requires,” Upguard researchers who discovered the exposed datasets said in a Wednesday post.

Cultura Colectiva, a media company based in Mexico, collected data on responses to their Facebook posts, enabling them to tune an algorithm for predicting which future content will generate the most traffic.  The Cultura Colectiva contains 146 gigabytes of data detailing comments, likes, reactions, account names, FB IDs and more.

At the Pool, meanwhile, which launched in 2011, is an app that was integrated into Facebook’s platform that served as a way of introducing users to potential new friends.

facebook exposed data

At the Pool dataset

In the case of the exposed At the Pool database backup, researchers found that plaintext Facebook passwords for 22,000 users were exposed on public internet via an Amazon S3 bucket.

The database also exposed data like user IDs, account names, user’s “friends” on Facebook. Their likes, interests, photos and more.

“The passwords are presumably for the ‘At the Pool’ app rather than for the user’s Facebook account, but would put users at risk who have reused the same password across accounts,” researchers said.

While the “At the Pool” webpage ceased operation in 2014, “this should offer little consolation to the app’s end users whose names, passwords, email addresses, Facebook IDs, and other details were openly exposed for an unknown period of time,” researchers said.

Researchers notified Facebook about the Cultura Colectiva data on Jan. 10, and again on Jan. 14. There was no response, researchers said. Due to the data being stored in Amazon’s S3 cloud storage, researchers then notified Amazon Web Services of the situation on Jan. 28, which acknowledged the incident but also did nothing.

“It was not until the morning of April 3rd, 2019, after Facebook was contacted by Bloomberg for comment, that the database backup, inside an AWS S3 storage bucket titled “cc-datalake,” was finally secured,” researchers said.

Meawhile, the data stemming from “At the Pool” had been taken offline just as researchers were looking into the data origin, and before they sent a formal notification email to Facebook, researchers said.

The incident comes a mere couple of weeks after hundreds of millions of Facebook user passwords were found stored in plain text for years, discovered earlier in March.

It also comes almost a year after the Cambridge Analytica incident and several other Facebook data security problems over the past year (such as sketchy data sharing partnerships and other privacy violations). These incidents show that Facebook is not in control of the sprawling amount of data around its platform, Upguard researchers said.

“As Facebook faces scrutiny over its data stewardship practices, they have made efforts to reduce third party access,” they said. “But as these exposures show, the data genie cannot be put back in the bottle. Data about Facebook users has been spread far beyond the bounds of what Facebook can control today. Combine that plenitude of personal data with storage technologies that are often misconfigured for public access, and the result is a long tail of data about Facebook users that continues to leak.”

To prevent similar incidents from happening in the future, Facebook needs to reach out to its various partners as well as app developers to secure any customer data, Mukul Kumar, chief information security officer and VP of cyber practice at Cavirin, told Threatpost.

“Facebook’s biggest issue now is probably not data on its own servers, but that of partners,” Kumar said. “Separately, users must assume that any data posted to Facebook and others has been compromised, and take necessary actions to avoid identity theft.”

This article was updated on April 3 at 3:10 pm to reflect Facebook’s statements and third-party researcher comments on the incident. 

Suggested articles