Researchers around the world put their heads together and solved the identity of ‘mystery code’ in the Duqu Trojan horse program, researchers from Kaspersky Lab announced on Monday. Weeks after announcing that they had discovered computer code of unknown provenance in the innards of the Duqu Trojan, Kaspersky announced that the mystery code was written in the common C programming language and compiled with Microsoft Visual Studio 2008. However, the Duqu authors modified the underlying C code with a customized extension for combining object oriented programming with the older C language – a variant sometimes termed “OO C.”
The mystery began in early March, after Kaspersky researchers went public with their struggle to identify how a key component of the Duqu Trojan was created. After isolating and analyzing the Trojan, researchers found segments of a key module that controlled Duqu’s command and control function were written in an unknown programming language – something that Kaspersky Lab expert Igor Soumenkov described as an object oriented language, but one without references to C++ or any other high level programming or scripting language like C++, Objective C, Java or Python.
The researchers’ admission set off a storm of speculation from malware experts and programmers around the globe. More than 200 comments on the researchers Website, securelist.com, offered suggestions. Scores more came via e-mail. The debate also spilled into online forums such as Slashdot and Reddit. Contributors fingered everything from LISP and Delphi to Google Go and Forth as the mystery coding language.
A week later, researchers announced that, with the help of crowd sourced researchers, they had cracked the code. The language identifying the mystery language as “C” source code compiled with Microsoft Visual Studio 2008 and special options for optimizing code size and inline expansion. The code was also written with a customized extension for combining object-oriented programming with C, generally referred to as “OO C.”
Why was it so difficult to identify code written in “C,” a ubiquitous programming language? Kaspersky Lab Expert Vitaly Kamluk said that applications compiled from languages like C and C++ can’t be decompiled into the original source code. “It’s a one way transformation,” Kamluk told reporters in a conference call on Monday. That leaves researchers to guess the original language and compiler used by analyzing the code in an intermediate state, known as assembler code. Typically, that isn’t difficult, Kamluk said: most software engineers use one of a handful of popular languages and compilers. And each leaves clues in the assembler code that can be read like a fingerprint. The Duqu code, however, didn’t have any of these telltale signatures.
“We’ve see a lot of malware in all possible languages. Duqu didn’t look like C++ code. It was clearly something custom,” Kamluk said.
While Objective C was one option – and, in fact, one of the short list of suggestions provided to the company – it was difficult to find conclusive evidence that the assembler code. That proof came by way of a submission to news aggregation site Reddit. A post in the ReverseEngineering sub reddit recognized snipets of the Duqu code that were very similar to code produced by SOO – Simple Object Orentation for C, an object oriented framework for the C language. While SOO is one of a few different such frameworks, the output of those other frameworks doesn’t match the Duqu code nearly as closely as SOO, according to a post by Igor Soumenkov on Kaspersky’s Securelist blog.
Kamluk said the use of OOC rather than the more common C++ language suggested that the developers who wrote Duqu were likely older developers who came of age writing programs in C. “”It’s what they were familiar with,” he said.
Soumenkov said other reasons may have played a role, as well. Professional developers of a certain generation “don’t trust C++ compilers,” he wrote. “C was a direct evolutionary step over assembler and quickly became a standard. When C++ was published, many old school programmers preferred to stay away from it because of distrust in memory allocation and other obscure language features which cuse indirect execution of code,” he wrote. Also, the C language is extremely portable to other platforms – a plus for a malicious Trojan.
While much is still not known about who created Duqu – for example: its exact purpose or country of origin – Kaspersky Lab researchers say that those behind it were running a “highly sophisticated” operation, typical of what is found in “complex ‘civil’ software projects, rather than contemporary malware.”
The analysis moves the world’s understanding of Duqu forward and, in part, dispels earlier analyses that linked the malware to the Stuxnet worm. While the two families were superficially similar and both complex, subsequent analysis of the Duqu code makes it clear that Stuxnet and Duqu were more different than alike.
Kamluk said that he hopes the research helps the community move forward from understanding the Trojan to, possibly, identifying who is behind it and bringing those individuals to justice.