Data-Enriched Profiles on 1.2B People Exposed in Gigantic Leak

Although the data was legitimately scraped by legally operating firms, the security and privacy implications are numerous.

An open Elasticsearch server has exposed the rich profiles of more than 1.2 billion people to the open internet.

First found on October 16 by researchers Bob Diachenko and Vinny Troia, the database contains more than 4 terabytes of data. It consists of scraped information from social media sources like Facebook and LinkedIn, combined with names, personal and work email addresses, phone numbers, Twitter and Github URLs, and other data commonly available from data brokers – i.e., companies which specialize in supporting targeted advertising, marketing and messaging services.

Taken together, the profiles provide a 360-degree view of individuals, including their employment and education histories. All of the information was unprotected, with no login needed to access it.

“it is a comprehensive dataset collected from B2B [business-to-business] lead-generation companies’ lists,” Diachenko told Threatpost via Twitter.

If accessed by cybercriminals, the data, which includes scores of related accounts tied to each individual, could be used for highly effective, targeted phishing attacks, business email compromises and identity theft, among other things.

“Information like this is extremely useful to criminals as a starting point in hacking a number of related accounts and also lends itself the potential for increased credential stuffing attacks,” Carl Wearn, head of e-crime at Mimecast, said via email. “This information obviously also provides a fantastic treasure trove of information for the means of industrial, political and state-related espionage and there are multiple malicious uses for the data leaked from this breach.”

For affected consumers, remediation is no picnic, either.

“Data breaches that expose information such as phone numbers to personal accounts like email or social accounts are just as serious as ones that expose payment information,” Zack Allen, director of threat operations at ZeroFOX, told Threatpost. “Luckily for payment information, you can change your credit card, or your password to your accounts. But what can victims of this breach do when their phone number and Facebook profile is leaked? Changing your phone number can cost money with your carrier, you also have to update all of your contacts with your new phone number, plus all of your two-factor accounts.”

Diachenko and Troia’s investigation uncovered that the data sets came from two separate lead-generation companies, whose business it is to assemble highly detailed profiles of individuals: People Data Labs (PDL) and OxyData[.]io.

“The majority of the data spanned four separate data indexes, labeled ‘PDL’ and ‘OXY,’ with information on roughly 1 billion people per index,” the researchers wrote in a writeup on Friday. “Each user record within the databases was labeled with a ‘source’ field that matched either PDL or Oxy, respectively.”

After notifying both companies, both said the server in question did not belong to them. However, the data certainly appeared to.

“In order to test whether or not the data belonged to PDL, we created a free account on their website which provides users with 1,000 free people lookups per month,” the researchers explained. “The data discovered on the open Elasticsearch server was almost a complete match to the data being returned by the People Data Labs API. To confirm, we randomly tested 50 other users and the results were always consistent.”

OxyData meanwhile sent Diachenko a copy of his profile, and the data fields also matched.

The researchers said they were unsure how the data came to be collected in the now-closed database. Could it be a customer of both PDL and OxyData, they wondered? Or, was the data had been stolen and placed in the storage bucket by hackers? The only clues as to the owner of the server was the IP address (, and that it was hosted with Google Cloud.

While the incident is not a data breach per se (but rather a story of yet another misconfigured server), it brings up two different concerns. First, what liability do the data originators (PDL and OxyData) have to the people whose profiles were exposed? And two, even though the information is aggregated from allegedly public sources, what does this kind of “data enrichment” mean from a privacy perspective?

To the first concern, Kelly White, CEO at RiskRecon, believes that the lead-generation companies are on the hook for the exposure.

“Data…is easily and perfectly replicable,” she said via email. “Every location where the asset exists must be known and protected. This requires that purveyors of sensitive data know their customers well and for what purposes they will use the data. Regulators are increasingly holding the original aggregators of sensitive data responsible for the protection of sensitive information, regardless of where it is stored or to whom they share it with. As such, while the originator of this data may not have been breached, they will likely suffer blowback.”

Diachenko took a similar view: “One could argue that because PDL’s data was mis-used, it is up to them to notify their customers.”

To the second concern, the privacy implications around rich personal profiles continue to be a source of discussion. “Collected information on a single person can include information such as household sizes, finances and income, political and religious preferences, and even a person’s preferred social activities,” noted Diachenko and Troia, in their posting.

Worryingly, some of that information can come from sources that are decidedly not public. For instance, one of the phone numbers returned for Diachenko’s profile was an old landline that came as part of an AT&T TV bundle. “The landline was never used and never given to anyone – I never actually owned a phone, yet somehow this information appears in my profile,” he said.

The most famous example of the mis-use of such profiling is the Cambridge Analytica scandal, in which Facebook allowed a third-party application to hand over the data of up to 50 million platform users to the company. That was then combined with other data to create highly detailed profiles that the Trump campaign used to micro-target population segments with 2016 election messaging.

This latest revelation of the breadth of such data-enrichment underscores that even after Cambridge Analytica, privacy practices have not moved forward, Diachenko noted.

“Due to the sheer amount of personal information included, combined with the complexities identifying the data owner, this has the potential raise questions on the effectiveness of our current privacy and breach notification laws,” he said.

Mimecast’s Wearn agreed: “This particular breach highlights the trade in personal details which takes place and the inherent risks to this normalized and relatively uncontrolled practice,” he said. “Due to its scale, it will undoubtably add to calls for better regulation and security in relation to the storage of personal data.”

Is MFA enough to protect modern enterprises in the peak era of data breaches? How can you truly secure consumer accounts? Prevent account takeover? Find out: Catch our free, on-demand Threatpost webinar, “Trends in Fortune 1000 Breach Exposure” to hear advice from breach expert Chip Witt of SpyCloud. Click here to register.

Suggested articles


  • Marie on

    "While the incident is not a data breach per se (but rather a story of yet another misconfigured server)" - what about this incident makes you conclude its not a breach? It was perhaps not a malicious data breach caused by hackers, but it was a breach of confidentiality which in my book (and in the eyes of the law is very definitely a breach. Just because its cause was sloppy internal security information was still breached.
  • Anonymous on

    Obviously these companies are not protecting our information due to data breaches they should be taken to court and sued and maybe other companies will take every measure to protect our information. This whole Data Breach ordeal has been getting out of hand and its time it stops
  • Mo on

    My private email address was exposed with this data breach. My email was spam-free for over a decade, and now it's getting swamped. Quite frankly, as a result of the exposure, I will need to abandon this email address and start up a new one. As a consumer, considering my private information was exposed by a company with whom I never explicitly provided consent to collect/collate/aggregate/store/release it, what options do I legally have?
  • Michael on

    Instead of writing scare stories, please provide realistic information on the actual threat to consumers like me. My conclusion from this article is that I would be better off committing suicide because I'm under dire threat of identity theft.
  • Thomas Larsen on

    Why are we not suing the companies for these breaches?
  • MeMe on

    Every single citizen needs to receive a copy of what is in this database relating to them. NOW. I mean now.
  • Gabriel Fair on

    We really need a social awaking about these kinds of data reconciliation companies. They are the digital equivalent of a peeping tom. Just because I might have forgotten to close the blinds to my room doesn't mean I have given someone permission to film me undressing. Likewise some of my information might be available on various websites, but doesn't mean I have given permission for others to take it.
  • Jackson Kennard on

    My email was also part of this breach. What I do not understand is 1 Why did they have my information and 2 why was this information not password protected? I like your idea of individuals having legal recourse.
  • David Florey on

    I believe a class action is required for such negligence - its a breach which ever way you look at it - regardless of the breach being a result of a hack or sloppy internal practices. There are laws in some countries that governments have the power to exercise (even when the incident occurs offshore) when its citizens are affected by such breaches.
  • Dave on

    Frankly, I believe the data breach occurred when these companies recorded anything about me: I do not know them, do not know of them, have not given them my consent to maintain any data about me. If my privacy has been invaded, it was by PDL and OxyData.
  • Trevor L Ray on

    @Mo: My condolences about that deluge of spam; it can be a pain. I should note that abandoning your email address is okay... so long as you DON'T DELETE YOUR EMAIL ACCOUNT. If you delete your email account, it then opens up your address for someone else to claim, and potentially impersonate you. If you do abandon your email, it's best to just forward email from chosen senders to a new account, and then stop using that spammed email address as your point of contact. TL;DR: Don't delete your email account - just don't use it anymore.
  • Trevor L Ray on

    @MeMe: Perfectly plausible to do, but would require a couple things. One would require full access to the database in question (this would likely be an in-house operation by the data aggregators themselves). If they aren't willing to execute a breach notification for all involved individuals, they would have to hire a third party, resulting in yet another entity gaining access to said data. The executor of the breach notification would have to be a fully trusted individual or business that specializes in handling sensitive data responsibly. Secondly, for a comprehensive contact of all individuals involved, it would require being able to contact those parties via the contact info in the database. This is simple enough for individuals with email addresses or mobile phone numbers listed in the database. However, for those without an efficient electronic contact in the database (e.g.: only mailing address or landline phone), it becomes monumentally difficult to disclose the breach. This would mean resorting to a robocalling and mass-mailing campaign of gargantuan scale. (The cost of doing so would probably mean this would never happen voluntarily on the part of the data aggregators, but only as a result of class-action litigation brought upon them by involved parties.) TL;DR: Unfortunately, a full-scale notification of breached information to all parties is unlikely unless a class-action lawsuit is brought to court.
  • Trevor L Ray on

    @Michael: Heh, you're not far off the mark that the level of info is disturbingly detailed, but unless you've got someone with both access to that 4TB database and a specific grudge against you, you've got little to worry about. You might have more spam in email and mobile SMS/voicemail, but other than that, I doubt there will be much targeted attacks against anyone except people that are known targets of attackers in possession of the database. Pursuing an en-masse attack against the entire database's list would definitely raise the attention of NSA/FCC/etc intelligence agencies that monitor the global internet traffic for such behavior. TL;DR: Not much to worry about here other than a potential increase in spam.
  • Anonymous on

    Living in Europe, can I use GDPR law to ask what are personal data leaked on me?
  • SYSCOM on

    Data Enrichment Exposure From PDL Customer Data Aggregator Breach Overview On October 16, 2019, Data Enrichment Exposure From PDL Customer was breached. Once the breach was discovered and verified, it was added to our database on November 22, 2019. What data was compromised: Phone numbers Email addresses Additional information, including: Employers, Geographic locations, Job titles, Names, Social media profiles Breach data provided by Have I Been Pwned
  • Oliver on

    We deserve better security and privacy from our technology providers. Exposed data like this leak have real financial consequences for individuals, companies, and countries. Ransomware attacks more than doubled in Q1 2019, while identity theft and fraud cost US individuals $1.5 Billion last year. SeMI.technologies would prefer to beat Elasticsearch through better search results rather than security lapses; however, if they continue taking a hands-off approach, then we are happy to beat them there too.
  • Robert Moore on

    What makes your 'private' email address private?
  • Zelda on

    I got a notice from my employer's ID Monitoring company, telling me about this breach and suggesting that changing my email password would be of some value. That makes zero sense to me. How is that going to prevent ID theft with all the other info about me already exposed?
  • Monique Collins on

  • Alan Mc on

    Suddenly the entire Date Security Industry became irrelevant when everyone else realised there was nothing left to steal. Well done everyone a blunder this big took years of dedication and persistence.
  • Kerry McCombe on

    Hi everyone. I'm with you all on a class action against. My email was completely hacked, contacts targeted with malware emails from someone pretending to be ME! Its been a F-ing nightmare. My info is within this breach. Anyone know the actual steps to getting a class action started?
  • mayela on

    I agree unfortunately I just found out I've been breached Twice and can't even do anything about it but wait and see what I get slammed with this is not fair and certainly not right for the victims
  • Anonymous on

    My wife accused me of hacking her phone around the same time for some reason my name came up on her phone as co device administrator, I wonder if this is what caused it and it let her get into my Google account without correct password

Subscribe to our newsletter, Threatpost Today!

Get the latest breaking news delivered daily to your inbox.