Metadata Analysis Draws its Own Conclusions on WannaCry Authors

Author: Michael Mimoso

June 15, 2017 10:34 am

Researchers at Telefonica’s cybersecurity unit ElevenPaths conducted an analysis of WannaCry metadata.

The most intriguing mystery that remains about WannaCry is the identity of the attacker. The theory with the best legs is that North Korea’s Lazarus APT is the entity behind the worldwide ransomware outbreak given the discovery of shared code samples in the malware with older Lazarus attacks.

That, however, doesn’t definitively dispel other contentions that point the finger at cybercriminals or other nation-state operatives. For example, a linguistics analysis of the 28 ransom notes embedded in the malware moved away from the Lazarus theory and concluded that the author was a native Chinese or English speaker. It also stated that the Korean version of the ransom note was among the most poorly written, or translated. To take that conclusion as gospel, however, seems premature as well given that native speakers could easily write in broken versions of their language.

Researchers at ElevenPaths, the cybersecurity unit of Telefonica, Spain’s largest telecommunications provider and one of WannaCry’s victims, weren’t satisfied and attacked the attribution question from a metadata perspective and used it to dissect the author’s actions in the days leading up to the May 12 attack.

Sergio de los Santos, the labs leader at ElevenPaths, said the author overlooked how much information that metadata buried in the malware files could give away about their identity. For example, the author and operator fields in the RTF files used in the ransom note are set to Messi, a reference to footballer Lionel Messi. This reference was also found in the first version of WannaCry that surfaced in March. The author also types in Korean, the default language the author used in the EMEA version of Word that was used to create the RTF files. This lends weight to the theory that the broken Korean may have been intentional, de los Santos said.

“The broken Korean, for me, means he was trying to hide himself. That’s a hint he was from Korea,” de los Santos said. “He forgot to change his [default language in] Word. If you’re trying to hide yourself from being English and write in broken English, anyone could do that. But I would forget to change my default language. He made the mistake of forgetting to change the default language. He tried to force a broken Korean.”

De los Santos’ team was also able to deduce a timeline of the days leading up to the first detections of WannaCry on May 12 based on metadata, which they extracted with an in-house tool called Metashield Clean-up.

The scheme started April 27, it appears, when the author created a bitmap image for the desktop background and the r.wnry readme file. On May 9, a .zip file was created that included tools to connect to Tor for ransom payments. That .zip was dropped onto the attacker’s filesystem May 9 at 16:57 the attacker’s local time. One day later, another file containing .onion domains and a Bitcoin wallet were created. On May 11, edits were made to the bitmap, readme, and .onion domains. The files were added to a password-protected .zip file on the same day. On May 12, at 2:22 a.m. local to the attacker, the executables were added to the .zip file and the payload was ready.

“This is the ‘magic’ hour as it is established as the time of infection in every affected system,” ElevenPaths researchers wrote in a report on the research.

The researchers were able to glean from the metadata how long the attacker spent editing each individual language file and that the default language was always Korean. The use of .zip files were another giveaway as to clues about the attacker’s location because they store the last access time and create dates obtained from the attacker’s disk, as well as his local timezone.

Going off the final .zip create date and time of May 12 at 2:22 a.m. and the first detections of WannaCry in Taiwan and throughout Asia, the researchers speculate, below, that the attacker was in the UTC +9 timezone, and certainly between UTC +2 and +12.

“He should have used .txt files which stores no metadata at all,” de los Santos said, adding that .txt files cannot be used to create files that include colors and different point sizes that were prevalent throughout the ransom note. A rich text format file, could, however. “You can write rich text format with lots of programs, but with Word, we have discovered that the default language is kept in the file and the name of the user is kept in the file. That was a mistake, and I think he didn’t notice.”