Research is expected to be unveiled today that challenges the industry’s current reliance on dynamic malware analysis as the best means of early detection of infections.
Instead, researchers from the Georgia Institute of Technology, the IMDEA Software Institute and EURECOM posit that a better approach would be an analysis of network traffic to suspicious domains that would potentially cut detection times down by weeks or even months.
Their paper, “A Lustrum of Malware Network Communication Evolution and Insights,” is scheduled to be presented today at the IEEE Security and Privacy Symposium in San Jose, Calif.
The researchers’ conclusions are based on a study of five years’ worth of network traffic from a large U.S. internet service provider, comprised of more than five billion network events. The group had more than 26 million malware samples at their disposal, and studied DNS server requests made by malware and potentially unwanted programs (PUPs), as well as the timing around the registration of expired domains.
The researchers concluded that attackers—including spammers and adware purveyors dabbling in PUPs—re-use infrastructure over and over and that provides a better early-detection signal than an exclusive study of malware and PUP domains. They found more than 300,000 malware samples were active for at least two weeks before they were submitted to a feed such as VirusTotal or picked up and analyzed in a vendor feed.
“When we looked at when malware samples actually showed up in malware feeds where they dynamically analyzed and network signal was extracted from them, we noticed that network signal was extracted in the feed often weeks or months after we saw the first resolutions for that domain in real network traffic from a major ISP in the U.S,” said Chaz Lever of Georgia Tech, one of the report’s coauthors along with Platon Kotzias, Davide Balzarotti, Juan Caballero and Manos Antonakakis.
Lever said that traffic could be command and control, reporting or any other type of beaconing reaching back out to the infrastructure used by the attackers.
That infrastructure was the vital area for the researchers in reaching their conclusions. In their five-year sample of ISP data, the researchers saw a lot of re-use of infrastructure, ranging from shared webhosts, bulletproof hosting providers and content delivery network where malicious traffic could be hiding in plain sight before it’s flagged as malicious.
“As someone protecting a network, you see exactly what infrastructure is being reached out to by those domains in that (vendor) feed. If I just rely on waiting for the domains that I see from malware to come in, that’s bad,” Lever said. “What I do see is that even though the domains frequently change, the infrastructure often seems to be reused. So if I’m see similar infrastructure from these feeds, I can go back and maybe not use domains, but look at the infrastructure and see what else is reaching out to that infrastructure. If you see more stuff reaching out to that infrastructure, maybe that’s a flag even if it’s something not from a specific feed.”
The researchers saw “large pockets of abuse” beaconing out to the same infrastructure over and over, Lever said, throughout the sample of network data.
They also contend that regardless of the means of infection, most malware communicates to an attacker-controlled server for instructions, or to send exfiltrated data, for example.
“The choke point is the network traffic, and that’s where this battle should be fought,” Antonakakis said.
The paper also includes an extensive classification of malware samples and lists of the most frequent offenders. For example, MyDoom, which is almost a decade old, still tops the list of spam families by a wide margin (82,000 samples and six million MX lookups). Others such as zbot, Kelihos, and Upatre are in the top 10 ranked by MX lookups. Most of the samples, however, were classified as potentially unwanted programs rather than malware, which are much more likely to re-use infrastructure, Lever said.
Dynamic DNS was another haven for abuse, Lever said, adding that 50 percent of the samples they examined had used dynamic DNS.
“If you’re looking for a place to identify abuse, look for dynamic DNS in your network,” Lever said, adding that they saw almost nine million samples doing dynamic DNS lookups. “It’s a very popular communication method for them.”
Lever stressed that malware feeds are excellent at detecting known threats, but a look at the network signal goes a long way toward reducing the lag time between when a sample is first communicating to an attacker and when it’s detected.