Gitrob Combs Github Repositories for Secret Company Data

Gitrob, an open source intelligence tool, helps security analysts search Github organization repositories for files not meant for public consumption.

Free online code repositories such as GitHub provide a valuable collaboration service for enterprise developers. But it’s also a trove of potentially sensitive company and project information that’s likely to warrant attention from hackers.

An application security specialist from Berlin has developed a tool he hopes can keep companies a step ahead. Gitrob is an open source intelligence command-line tool that mines GitHub for files belonging to an organization and runs them against pre-determined patterns looking for potentially sensitive information that isn’t meant for public consumption.

Its developer Michael Henriksen, who does application security and code auditing for SoundCloud, says Gitrob starts off by using GitHub’s public API to query a Github organization’s list of public members.

“When the list of members is obtained, it queries GitHub again for each member that returns a list of their public repositories,” Henriksen told Threatopst. “The contents of the repositories are never downloaded to the machine, it simply uses GitHub’s API again to obtain a list of file names. When clicking on a file in the web interface to see its contents, it is fetched from GitHub’s servers.”

Henriksen said he has built a number of Observers, which act as Gitrob plug-ins, that flag files matching certain patterns. Organization members, repositories and files are saved to a PostgreSQL database for analysis before a Sinatra webserver is started locally in order to serve a web app that presents the data for analysis, which must be conducted manually.

“All the files are sent through these observers, one by one, and the observers can then decorate or make changes to the file’s database record, before it is saved to the database,” Henriksen said. “Right now, Gitrob actually only contains one observer which will flag files that match patterns of interesting files, but the design makes it easy to introduce new logic to look for other things. The patterns are built in to the tool itself.”

Security analysts inside an enterprise should feel at home using Gitrob, Henriksen said, but cautioned that the tool will point out a default set of potentially sensitive items. An analyst would have to manually comb through them to determine whether those files should be public.

“A security team in an organization can use Gitrob to periodically scan their repositories for sensitive files that might be checked in,” Henriksen said. “The current version is not really suitable to run in an automated fashion, so it would have to be run manually, but I am planning to change that in the future so that it can be run automatically and report to somewhere when new things are found.”

Henriksen said he tested Gitrob against a number of GitHub repositories belonging to companies of different sizes; he found a variety of information using Gitrob from username-password combinations, email addresses, internal system mappings and other information that could be used in phishing campaigns or other social engineering attacks. Henriksen said he notified affected organizations; most were appreciative he said.

“I am not aware of any tool that specifically targets GitHub organizations like Gitrob does,” Henriksen said. “People have been finding sensitive files with GitHub’s search functionality for a while (kind of like Google dorks for Github), but I think Gitrob is the first tool that makes the task of finding sensitive files within an organization very easy.”

Installation instructions and requirements can be found on his Github page.

Image courtesy othree.

Suggested articles