Cyber attacks against government agencies, infrastructure providers and other high-profile targets are regularly in the news, stirring talk of digital warfare and international sanctions. The forensic investigations that follow these hacks can reveal the method and magnitude of an attack. Pinpointing the culprit, however, is frustratingly more difficult, resulting in little more than vague accusations that the guilty parties (might be) working for a particular foreign government or cyber gang.

Case in point: the recent cyber attacks that shut off power to 80,000 Ukrainians and infiltrated computers at the country’s largest airport. Some Ukrainian officials were quick to point the finger at the Kremlin due to their ongoing conflict and because the attacks apparently came from computers in Russia. Others, however, caution Internet addresses can be spoofed and that, even though investigators have recovered some of the “BlackEnergy” malicious software (malware) at fault, they are unable to figure out exactly who wrote it.

Circumstantial evidence

“Attribution is a curious beast,” says Morgan Marquis-Boire, a senior researcher at the University of Toronto’s Citizen Lab and former member of Google’s security team. “There are a variety of techniques that you can use to make educated assertions about the nature of an attack.” These include examining the sophistication of the tools involved, the techniques, the type of data stolen and where it was sent. “I call this strong circumstantial, and this is how a lot of the attribution is done in public malware reports.”

Stronger attribution is possible but requires the right type of data. In a few cases that information has come from the trove of documents leaked in 2013 by Edward Snowden, the self-exiled, former U.S. National Security Agency (NSA) contractor. One set of documents informed European Union officials that the NSA had tapped E.U. computer networks. Some of the documents also revealed that a British surveillance agency—Government Communications Headquarters (GCHQ)—was spying on Belgacom, Belgium’s largest communications provider, which is partly state-owned. In both cases cyber investigators later identified Regin, a complex piece of malware, as the spyware used to steal secret information out of the targeted networks. Neither U.S. nor British intelligence have claimed authorship of Regin, however, so even with Snowden’s help most researchers will not go out on a limb to say with complete confidence that the NSA spied on E.U. officials or that GCHQ hacked Belacom.

Most of the time highly confidential leaked documents are not available, and investigators must deal head-on with malware specifically written to avoid detection and cover the author’s tracks. A cyber attack or data theft investigation is in many ways analogous to the work depicted in Law & Order, CSI: Crime Scene Investigation or any other fictional police procedural. Cybersecurity forensic investigators frequently begin by analyzing infected computers (the body) and the malware (murder weapon) that took them down. They can learn a lot from even a portion of a malware program by studying the code used, how it was written and how it communicated back with the person or group who wrote it.

Malware written using a lot of customized code suggests a skilled, well-equipped programmer who is very knowledgeable about the computers and network targeted. The use of more generic or open-source code, on the other hand, might be less effective but it also lacks distinguishing characteristics that might be traced back to a particular programmer or organization. Cyber attackers likewise might go with simpler tools so as not to tip their hand regarding the full extent of their capabilities, Marquis-Boire says.

Digital fingerprinting

Marquis-Boire and other cybersecurity researchers are developing new ways to build malware profiles that serve as digital fingerprints to identify a particular program’s formatting styles, how it allocates memory, the ways it attempts to avoid detection and other attributes. Investigators can also learn a lot by the way a programmer names certain features in a program or the way they configure the malware to transmit purloined data, Marquis-Boire says. Forensic malware examinations alone will not expose who is behind a particular cyber attack but they are a key piece of any investigation, he adds.

In one instance Marquis-Boire and University of Toronto colleague Bill Marczak analyzed e-mails received by Bahraini activists and found a piece of spyware intended to steal information from their computers. Further study of the spyware revealed similarities with the FinFisher surveillance software that Gamma International sells to law enforcement agencies. Gamma has denied selling the software to the Bahraini government for spying on its people and suggested the regime may have used a pirated copy to spy on the activists, along with prominent lawyers and opposition politicians. In typical fashion a government had been caught almost, but not quite, in the act.

Other researchers are applying machine learning to automate the matching of coders with their creations. Although malware is usually a compiled program—as opposed to raw source code—that is altered so it cannot be recognized by antivirus software, investigators can identify various obfuscation techniques and start to see patterns across different pieces of malware, says Aylin Caliskan-Islam, a Princeton University postdoctoral research associate. Caliskan-Islam and her colleagues last year produced a study in which they used algorithms to automate the analysis of coding styles from 1,600 programmers, correctly attributing authorship with 94 percent accuracy. They found that the more skilled a programmer is and the more complex the code they use, the easier it is to connect them with their programs.

One of the biggest challenges in applying this approach to malware outside the lab is the lack of “ground truth” attribution data that can be used to train the machine-learning algorithms. “The more samples we have in the past with known programmers, the more easily we can extract coding styles and train our machine-learning models to identify them,” Caliskan-Islam says. One of her goals is to partner with a cybersecurity company that can provide her with data from real-world investigations and use that data to further develop her algorithms.

Close to the vest

Governments also have access to vast amounts of data that could help with cyber forensic investigations. It is in their best interest, however, not to share what they know out of concerns that they will reveal too much about their cyber-snooping programs. There’s a good reason that the U.S. government would not reveal much about why it explicitly blamed North Korea for the 2014 cyber attacks on Sony Pictures Entertainment, according to Marquis-Boire. “I wouldn’t expect them to make their evidence public,” he says. “The NSA has unprecedented and unparalleled access to the workings of the Internet.” If they publicize what they know about certain cyber attacks, the perpetrators might use the information to alter their approach.

That would add yet another advantage that cyber attackers have over the people trying to catch them.