When a new piece of malware surfaces, it’s typically analyzed eight ways from Sunday by a long list of antimalware and other security companies, government agencies, CERTs and other organizations who try to break it down and classify its capabilities. There’s a lot of duplicated effort there, and a group of researchers is building a new tool called CrowdSource that is designed to take advantage of the existing analysis capabilities in the community and perform automated malware analysis to provide rich reports on each new sample.

The tool, which the researchers will discuss at Black Hat on Thursday, is the work of a group from Invincea that was looking for a way to advance the art of automated malware analysis. But this isn’t just another analysis engine that compares a new sample against known-bad behaviors. CrowdSource is built using a machine-learning approach that trains the detection engine using millions of technical documents found on the Web. The authors have applied the engine to about 15,000 samples so far and say that they easily can scale it to go through millions of samples.

“We sort of see a hole in automated malware analysis. Virus Total and Threat Expert let you upload suspicious files, but the issue is you don’t get a very rich report at the end,” said Joshua Saxe, a lead research engineer at Invincea Labs. “Hopefully this can be a tool that reverse engineers can use quickly. It can be a first pass to complement existing triage systems.”

Saxe and his research partners, Kristina Blokhin, Rafael Turner, Nathan Goldschmidt and Jose Nazario, are planning to release CrowdSource as an open source tool. Their research was funded by DARPA’s Cyber Fast Track program, an program at the Pentagon that was designed to fund innovative security research. The program also has put money into research projects on car hacking, near-field communications and many other emerging fields. The research is part of a larger program at DARPA on malware genetics.

Some of the malware capabilities that CrowdSource has the ability to detect include:

  • detects debugger based reversing
  • encrypts / decrypts data
  • provides remote desktop capability
  • steals or modifies cookies
  • mines or steals bitcoins
  • communicates over smtp
  • has gui functionality
  • communicates with database
  • communicates via irc protocol
  • logs keystrokes
  • takes screenshots
  • communicates via xmpp
  • communicates via socks protocol
  • accesses webcam
  • downloads files
  • uploads files
  • communicates via ftp

In addition to the ability to recognize certain capabilities of malware and malicious files, Saxe said CrowdSource also may be useful for doing large-scale analysis of malware to take the burden off the small number of trained analysts doing this work.

“The other thing we want to do is demographics of malware on a large scale,” he said. “Perhaps we could use it to survey the malware landscape to say that there’s been a shift away from using remote desktop bugs or whatever. That currently isn’t possible because there aren’t enough expert analysts.

Saxe said that the team ultimately hopes to release CrowdSource as a command-line tool as well as a Web-based version.

 

Categories: Malware, Web Security