AI Creates False Documents That Fake Out Hackers

The algorithm hides sensitive information in a sea of decoys

Fox being deceived with decoy chickens. — Thomas Fuchs

Join Our Community of Science Lovers!

Hackers constantly improve at penetrating cyberdefenses to steal valuable documents. So some researchers propose using an artificial-intelligence algorithm to hopelessly confuse them, once they break in, by hiding the real deal amid a mountain of convincing fakes.

The algorithm, called Word Embedding–based Fake Online Repository Generation Engine (WE-FORGE), generates decoys of patents under development. But someday it could “create a lot of fake versions of every document that a company feels it needs to guard,” says its developer, Dartmouth College cybersecurity researcher V. S. Subrahmanian.

If hackers were after, say, the formula for a new drug, they would have to find the relevant needle in a haystack of fakes. This could mean checking each formula in detail—and perhaps investing in a few dead-end recipes. “The name of the game here is, ‘Make it harder,’” Subrahmanian explains. “‘Inflict pain on those stealing from you.’”

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Subrahmanian says he tackled this project after reading that companies are unaware of new kinds of cyberattacks for an average of 312 days after they begin. “The bad guy has almost a year to decamp with all our documents, all our intellectual property,” he says. “Even if you’re a Pfizer, that’s enough time to steal almost everything. It’s not just the crown jewels—it’s the crown jewels, and the jewels of the maid, and the watch of the secretary!

Counterfeit documents produced by WE-FORGE could also act as hidden “trip wires,” says Rachel Tobac, CEO of cybersecurity consultancy SocialProof Security. For example, an enticing file might alert security when accessed. Companies have typically used human-created fakes for this strategy. “But now if this AI is able to do that for us, then we can create a lot of new documents that are believable for an attacker—without having to do more work,” says Tobac, who was not involved in the project.

The system produces convincing decoys by searching through a document for keywords. For each one it finds, it calculates a list of related concepts and replaces the original term with one chosen at random. The process can produce dozens of documents that contain no proprietary information but still look plausible. Subrahmanian and his team asked computer science and chemistry graduate students to evaluate real and fake patents from their respective fields, and the humans found the WE-FORGE-generated documents highly believable. The results appeared in the Association for Computing Machinery’s Transactions on Management Information Systems.

WE-FORGE might eventually expand its scope, but Subrahmanian notes that a document recommending a course of action, for instance, would be much more complex than a technical formula. Still, both he and Tobac think this research will attract commercial interest. “I could definitely see an organization leveraging this type of product,” Tobac says. “If this ... creates believable decoys without releasing sensitive details within those decoys, then I think you’ve got a huge win there.”

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American