A scan of more than 230 million web domains worldwide has uncovered 390,000 web pages with open .git directories – a worrying state of affairs that can expose a range of sensitive information.
Researcher Vladimír Smitka at Lynt Services performed the scan, starting first in his native Czech Republic before expanding the exercise to a global view. Using 18 virtual and four physical servers along with custom scripts for crawling the sites’ code, a four-week scan revealed that a range of sites – from large (top Alexa pages) to smaller fare – were ripe for data exfiltration.
The issue is a lack of proper configuration. Many web developers use the open-source development Git tool to build their pages; and while they shouldn’t leave the project’s standard .git information repository folder in a publicly accessible part of their sites, this often happens. Worse, they often inadvertently allow important information to reside there, Smitka explained.
“This is a nasty problem because it is possible to get current and past files with a lot of information about the website’s structure, and sometimes you can get very sensitive data such as database passwords, API keys, development IDE settings, and so on,” he said, in a recent posting on the issue. “This data shouldn’t be stored in the repository, but…I have found many developers that do not follow these best practices.”
In some cases, a query will give access to the directory structure, which would allow the information in the repository to be downloaded much more easily – and could mean that it’s even indexed by Google.
“The Git repository has a well-known structure, so you can simply download individual files and parse the references to the individual objects/packs in the repository,” Smitka said. “Then you can download them via the direct request.” He added that the readily available GitTools package would allow a bad actor to automate this.
“This is a story of complacency,” said Tim Woods, vice president of technology alliances at FireMon, in an email to Threatpost. “For years, we have seen this story play out in many other avenues as well. We’ve seen it in the early days of buffer overflows, database SQL injections or even today when I witness overly permissive firewall policies with little to no supporting documentations. Bad actors will always look to exploit those paths they identify as the least resistant and seek to leverage human complacency to their advantage.”
However, Smitka added that the issue can be easily overlooked, because a simple configuration check can be misleading.
“You can easily verify these rules by trying to open the <web-site>/.git/HEAD — if setup correctly it shouldn’t be working,” Smitka said. “If you only visit <web-site>/.git/ directly you will get an HTTP 403 error in most cases. It seems as though the access is denied, but it is only a false sense of security. Actually, the 403 error is caused by the missing index.html or index.php and disabled autoindex functionality…[and] access to the files is still possible.”
The researcher also took the above-and-beyond step of notifying the affected developers; he wrote a script to glean email contacts from the /.git/logs/HEAD file, and was able to retrieve 290,000 valid emails. After de-duping (some emails were associated with thousands of sites) and excluding the “machine” addresses of the servers themselves, he ended up with a list of 90,000 unique emails.
Then, he began the onerous task of manually emailing the contacts. “I prepared the landing page smitka.me with detailed information and mitigation, and referred people to it in the emails,” he said.
Smitka said around 18,000 emails weren’t delivered, and that some of the emails were classified as spam. Still, “after sending the emails, I exchanged about 300 additional messages with affected parties to clarify the issue. I have received almost 2,000 thank-you emails, 30 false positives, two scammer/spammer accusations, and one threat to call the Canadian police.”
As a side note, to gain a clearer picture of the affected footprint, Smitka also examined the technologies being used on the sites.
Most of the affected pages use the PHP programming language, although Python developers take the lead when the numbers are normalized by language market share. Meanwhile, in terms of which web servers and OS are used on the affected sites, Ubuntu is the leading Linux distro he said; CentOS is in third place. Tengine (a Chinese fork of Nginx) is also well-represented.
Additionally, the WooCommerce WP plugin was found on the majority of the sites, along with the CodeIgniter application framework. And WordPress dominates in terms of content management systems (CMS) behind the affected sites, he added.
“Only about 12 percent of these WordPress sites use the latest minor versions with the latest security patches,” he said. “It is because when WordPress finds the .git (or other VCS) directory, it disables automatic core updates. It works the same with Drupal and Joomla – the majority of these sites were the older versions.”
Smitka said that he plans to periodically check the state of open .git repositories on the web – with the hope that he finds fewer and fewer of them over time.
“I would like to recommend to everyone that you watch what you upload to your website more carefully — it’s not just about system versions but also various temporary test scripts,” he said. “It is also good to remember that things are changing – server configurations and team members, and what doesn’t seem like a problem today may be problem tomorrow.”
FireMon’s Woods added, “Secure coding practices is an area that must be taken more seriously. In today’s shared app-coding world and the growing IoT landscape, if we don’t put security at the forefront of our efforts then we can only expect to see more of the same. Security visibility is paramount across the hybrid enterprise today.”