UPDATE – With companies flocking to cloud services such as Amazon Simple Storage Service (S3) to store and serve static content on the cheap, naturally they’re making simple mistakes in doing so—and naturally, a savvy attacker is able to cash in.
Researchers at Rapid7 today released data from a project looking at the availability and security of files and other objects on Amazon S3. Businesses, generally small businesses, for example can store anything from backups to log files, static website images, and documents inside logical containers known as buckets. By default, the buckets are set to private, meaning that only certain users may list or download the objects stored in the bucket. Both the objects and the bucket can be password-protected. Public buckets, on the other hand, allow any user to view the contents. A company could also store private files inside a public bucket.
As it turns out, most users leave the default private setting. But Rapid7 security researcher Will Vandevanter, using information from HD Moore’s Critical.IO Project, the Bing Search API and a list of Fortune 1000 company names, found 12,328 buckets belonging to enterprises, and 1,951 of those had been reset to public exposing more than 126 billion files.
Rapid7 and Metasploit engineering manager Tod Beardsley told Threatpost that the switch from private to public can likely be traced to third parties.
“Companies will set something up on S3 where a contractor or another third party get access and it doesn’t get changed back [to private],” Beardsley said. “The problem is mostly likely these third parties. Someone has to go out of their way to change it.”
Vandevanter examined a random sample of 40,000 publicly visible files and found reams of data an attacker could use to break into a corporate network or sell on the underground.
“It should be emphasized that a public bucket is not a risk created by Amazon but rather a misconfiguration caused by the owner of the bucket,” he said. “And although a file might be listed in a bucket it does not necessarily mean that it can be downloaded. Buckets and objects have their own access control lists (ACLs).”
Among the 40,000 files, researchers found more than 28,000 PHP source files, some with database credentials and API keys, as well as thousands of CSV files storing personal contact information. Also in the files were sales records, click-through rates, video game source code, text documents—many marked personal or confidential—and more than 20,000 images.
“Much of the data could be used to stage a network attack, compromise users’ accounts, or to sell on the black market. Although more subtle, one of the other concerns was the number of publicly available log files,” Vandevanter said.
In addition to the Critical.IO Project data, researchers also used a free tool developed almost two years ago by Robin Wood, a longtime open source developer, penetration tester and Metasploit contributor. Wood’s Bucket Finder tool runs off a wordlist of common names likely to be used as bucket labels, and then checks public buckets to determine if directory indexing is enabled, and then checks those files as to whether they’re public.
“The biggest eye-opener was the percentage of public buckets that were found,” Beardsley said. “When you Google Dork something like this, you expect two pages of hits, 60 or so. To see thousands was surprising.”
Beardsley speculated that some of the businesses that left the buckets open were practicing security by obscurity, and not counting on the fact that the intuitive URLs used by the buckets (s3[.]amazonaws[.]com/[bucket name]) were guessable.
In 2011, Wood did a similar analysis on a smaller sample of 2,268 names, finding 131 public buckets and 9,683 public files within those—most of them image files.
Rapid7 also said it is possible to use Google cache searches and WayBackMachine to find private buckets that were once public.
Rapid7 recommends companies check buckets they own and examine the exposure of its contents and how the buckets and files therein are protected. They also recommend following Amazon’s best practices guides for data protection.
This article was updated to include comments from Rapid7 and to clarify throughout.