A massive archive of 1.8 billion publicly accessible social-media posts were found on the Amazon S3 storage buckets that belonged to a Pentagon contractor. The data was collected by the third-party contractor on the behalf of United States Central Command and United States Pacific Command.
Researchers at UpGuard found the data and the United States Central Command (CENTCOM) confirmed Monday the data was collected on the behalf of the department.
“The repositories appear to contain billions of public internet posts and news commentary scraped from the writings of many individuals from a broad array of countries, including the United States,” wrote UpGuard’s Cyber Risk Team, citing research conducted by Chris Vickery, director of cyber risk research for the firm.
The data was discovered on Amazon Web Services (AWS) S3 cloud storage buckets. The data included mostly benign social-media posts from mostly people in the United States who were raising privacy and civil-liberties issues in public online forums. Content was harvested from the comment sections of news sites, web forums, and social media forms such as Facebook, according to researchers.
“Among those are many apparently benign public internet and social media posts by Americans, collected in an apparent Pentagon intelligence-gathering operation, raising serious questions of privacy and civil liberties,” UpGuard wrote. “It remains unclear why and for what reasons the data was accumulated, presenting the overwhelming likelihood that the majority of posts captured originate from law-abiding civilians across the world.”
In a prepared statement CENTCOM spokesperson Maj. Earl Brown said the information collected was “not sensitive” and was not collected or processed for any intelligence purposes. Brown’s statement reads:
“All of the information is readily available public information related to our activities and obtained through commercial off-the-shelf programs in accordance with U.S. Code and Department of Defense policy in a consistent manner.
U.S. Central Command has used commercial off-the-shelf and web-based programs to support public information gathering, measurement and engagement activities of our online programs on public sites. The information is widely available to anyone who conducts similar online activities. The data is raw data that was provided to us by a contractor.
Last month, a researcher informed us that he had accessed data, secured in a DOD-compliant, web-based cloud. Once alerted to the unauthorized access, CENTCOM implemented additional security measures to prevent unauthorized access.”
The leak is just the latest in a long string of incidents where data has been exposed to the public internet via misconfigured servers. As of September 2017, IBM X-Force said 1.3 billion records tied to 24 incidents have been exposed. Accenture, Verizon, Dow Jones and Deep Root Analytics are just a few of the firms in the past year when it comes to the millions of private records and sensitive enterprise data exposed on cloud backends this year.
Vickery discovered the insecure storage buckets on Sept. 2, 2017. According to researchers all three of the AWS S3 cloud storage buckets were configured to allow an AWS global authenticated user to browse and download the contents.
“AWS accounts of this type can be acquired with a free sign-up. The buckets’ AWS subdomain names – ‘centcom-backup,’ ‘centcom-archive,’ and ‘pacom-archive’ – provide an immediate indication of the data repositories’ significance,” researchers wrote.
UpGuard identified the Pentagon contractor responsible for the leaky data as a defunct private-sector government contractor named VendorX. It said an analysis of the AWS bucket settings for one of the buckets called “centcom-backup” identified the management software was operated by employees of a company called VendorX
“While public information about this firm is scant, an internet search reveals multiple individuals who worked for VendorX describing work building Outpost for CENTCOM and the Defense Department,” researchers wrote.
“Taken together, this disparate collection of data appears to constitute an ingestion engine for the bulk collection of internet posts – organizing a mass quantity of data into a searchable form,” researchers wrote. “The former employee’s reference to ‘high-risk youth in unstable regions of the world’ is further corroborated by an examination of another folder within ‘centcom- backup.'”
Researchers said aside from privacy and surveillance concerns sparked by the publicly accessible data troves, they were shocked at how “critical information of a highly sensitive nature cannot be secured by the government—or by third-party vendors entrusted with the information.”
“A simple permission settings change would have meant the difference between these data repositories being revealed to the wider internet, or remaining secured,” UpGuard wrote.