An IBM AI Debates Humans—but It’s Not Yet the Deep Blue of Oratory

The give-and-take of formal arguments is still outside of a machine’s “comfort zone”—at least for now

Debate champion Harish Natarajan. — Debate champion Harish Natarajan participates in a live debate with IBM’s artificial intelligence technology during the company’s Think 2019 conference in February 2019.

Jane Tyska *Getty Images*

In 2019 Harish Natarajan took part in a debate with a five-and-a-half-foot tall rectangular computer screen in front of a live audience of about 800 people. The computer was Project Debater, an artificial intelligence system designed by IBM. Natarajan is a globally recognized debate champion. And the topic at hand was whether or not preschool should be subsidized.

Based on an audience vote, Project Debater lost the contest. But the “it” present held its own, forming logical opening statements. And in 2018 Project Debater won one debate and almost tied in another. Still, the system is fully capable of sounding awkward during an argument-and-rebuttal with an opponent.

While computers will not be ambling to a political podium any time soon, a study published today in Nature suggests that this algorithm is inching closer to engaging in the type of complex human interaction represented by formal argumentation.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The researchers observe that the requirements of a debate are outside the “comfort zone” for AIs, which have triumphed in a range of board and video games—not to mention a famous quiz show. In recent decades, startling advances have been registered in AI. In 1997 IBM’s Deep Blue became the first computer to defeat a reigning chess champion, besting titan Garry Kasparov in a six-game match. Fourteen years later IBM’s Watson defeated Jeopardy! all-stars Brad Rutter and Ken Jennings at their own game.

But a lot of competitive computer intelligence has been tested on tasks or games with a clear winner and loser. And it has been amenable to coding that leads to a defined binary algorithmic path to victory. What has eluded computer scientists is a system that can interact with the nuance that enables complex discourse with human beings. Project Debater is getting close to this goal.

In the new Nature paper, IBM researchers—comprising a collaborative team at the company’s AI research centers in Haifa, Israel, and Dublin, Ireland—report on their system’s progress. Following the 2019 debate, speeches by both Project Debater and three expert human debaters were evaluated on nearly 80 different topics by 15 members of a virtual audience.

In these human-against-machine contests, neither side is allowed access to the Internet. Instead each is given 15 minutes to “collect their thoughts,” as Christopher P. Sciacca, manager of communications for IBM Research’s global labs, puts it. This means the human debater can take a moment to jot down ideas about a topic at hand, such as subsidized preschool, while Project Debater combs through millions of previously stored newspaper articles and Wikipedia entries, analyzing specific sentences and commonalities and disagreements on particular topics. Following the prep time, both sides alternately deliver four-minute speeches, and then each gives a two-minute closing statement.

Based on audience and reader scoring, Project Debater managed to “win” in 2018 against one of the three experts, and it scored impressively high in making opening statements. But on average, it was still slightly inferior to the humans overall.The hurdle is maintaining a meaningful exchange that can take any number of directions, similar to a real human conversation. Still, the study results move the needle in developing an AI system that can understand and produce meaningful linguistic interaction.

“In recent years there’s been a tremendous amount of work in developing algorithms that can understand and generate human language,” says Noam Slonim, a distinguished engineer at IBM Research and principal investigator of Project Debater since its inception. “The tasks being pursued span from predicting the sentiment of a single sentence to more complex tasks such as machine translation and dialogue systems.” He adds that IBM’s results reflect a system that, while still coming in second place to a Homo sapiens “rival,” can engage with an opponent in a way that, until now, was out of reach with other AI systems. Plenty of such systems can generate what seems to be meaningful language with actual syntax. But a big question for the field is whether or not machines will ever be able to emulate actual human reasoning or become conscious.

“On stage, Project Debater is far from perfect, and its missteps reveal just how difficult—and how definingly human—argumentation and debate are,” says computer scientist Chris Reed of the University of Dundee in Scotland, who was not involved with the research but was present in the audience at the 2019 debate. “[Yet] the Project Debater research is a tour de force of innovative engineering.... The scale of the achievement of the IBM team is also clear from the live performance of the system: not only using knowledge extracted from very large data sets but also responding on the fly to human discourse.”

Natarajan and other debaters are not yet ready to concede defeat to “machine overlords.” But for better or worse—one hopes for the better—machine learning is starting to enter a realm beyond the defined rules of chess and Go.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American