New Analytics Tool Defines Language Used By Malicious Domains

OpenDNS went public with a new analytics tool that can be used to detect malicious domains used in APT and cybercrime campaigns.

OpenDNS has gone public with a new tool that uses a blend of analytics principles found outside information security to create a threat model for detecting domains used in criminal and state-sponsored hacking campaigns.

NLPRank is not ready for production, said OpenDNS director of security research Andrew Hay, but the threat model has been proven out and false positives kept in check to the point where Hay and NLPRank’s developer Jeremiah O’Connor were satisfied that it could be shared publicly.

What separates NLPRank from other analytics software that searches, for example, for typo-squatting domains used in phishing attacks, is that the OpenDNS tool also relies on natural language processing, ASN mappings, WHOIS domain registration information, and HTML tag analysis to weed out legitimate domains from the bad ones. The data comes from OpenDNS’ massive storehouses of DNS traffic (70 billion DNS queries daily), as well as from other sources provided by researchers investigating APT campaigns, for example.

The spark for NLPRank’s development was a repeating pattern of evidence from a number of phishing attacks used to gain a foothold for APT groups. Certain themes such as fraudulent social media accounts or password reset requests purporting to be from popular services such as Facebook or PayPal were used to add urgency for the potential victim, enticing them to follow the link to trouble.

“Using this malicious language and applying analysis to the domains, we can start picking them off prior to a campaign launching,” Hay said.

O’Connor shared details in a blog post on the science behind the analytics, including algorithms used in bioinformatics and data mining, natural language processing techniques that allow him to develop a dictionary of malicious language used in these campaigns that helps the tool predict malicious domain activity.

“NLPRank is designed to detect these fraudulent branded domains that often serve as C2 domains for targeted attacks,” O’Connor wrote, adding that the tool uses a minimum edit-distance algorithm used in spell-checkers and other applications to whittle down words used for typo-squatting domains and legitimate domains.

“The intuition behind using this algorithm is that essentially we’re trying to define a language used by malicious domains vs. a language of benign domains in DNS traffic,” O’Connor said.

Hay added that the domains used in the recently unveiled Carbanak APT bank heist, with losses anywhere between $300 million and $1 billion, were identified as malicious by NLPRank prior to the campaign going public during the recent Security Analyst Summit. Data from Carbanak, DarkHotel and other APT groups uncovered by Kaspersky Lab are among the data sets used to put NLPRank through its paces.

“This has been incredibly successful in looking at phishing kits that, at face value, are identical to the parent company’s site,” Hay said, stressing that the tool looks at various low-level code, JavaScript hosted on the site, redirects and more in its analysis. “The model picks them off and starts analyzing the data, making sure it’s associated with the parent company, that it was registered by someone associated with the parent domain through the WHOIS information, looking at how embedded HTML may be different versus the parent company and determining how much it deviates from the parent site.”

Eventually the tool will be folded into OpenDNS offerings, but Hay said more analysis capabilities, such expanded HTML and embedded script analysis, need to be added to further keep false positives at bay.

“The false positive rate is low, but it’s not at point where we are comfortable putting it into production or turning on automated blocking,” Hay said. “We want additional inputs to the model, but so far it’s looking great.”

Suggested articles