Pakinam Amer: This is a Science Talk podcast from Scientific American. I'm Pakinam Amer.
It’s as old as ancient Greece, and a practice, driven by passion as it is by facts and evidence.
It’s also exclusively human … animals cannot debate, machines cannot debate.
Well, here’s the thing …
[Audio from the San Francisco debate: “But this, this truly is a first for us. The first time that an artificial intelligence, namely Project Debater, will be on our stage arguing with a human being, and may the best debater win …”]
That debate, between man and machine, happened in 2019, in San Francisco in front of a live audience.
My guest today is Noam Slonim, a Distinguished Engineer at IBM Research Haifa, who, along with colleague Ranit Aharonov and others, created the AI that was on the stage.
A machine that can debate a human, without a script, and occasionally win.
A machine with a super computer for a brain.
Project Debater is a cloud-based AI system, created by IBM.
It’s built with an NLP or a natural language processing model, and trained using deep learning and machine learning techniques.
It took about seven years to develop.
And since, it has proven itself a formidable opponent to champion debaters worldwide.
Its model can scan over 4-hundred million newspaper articles and Wikipedia pages in the time it takes a person to finish a cup of coffee.
It has a synthetic female voice, soft, human-like with a metallic edge.
Some news outlets have called it Miss Project Debater, referring to it with the “she” pronoun.
I’m personally conflicted about humanizing it. But let me say this: Until it’s on Twitter, engaging in low level debates and foaming at the mouth over which animal — a cat or a dog — is the better pet, for now, “it” will do.
[Project debater: “But I suspect you’ve never debated a machine."]
Pakinam Amer: Noam, who decides if your AI has won the debate or not? How do you measure it?
Noam Slonim: I think, a reasonable measure was asking the audience to vote before the debate starts, and then vote again after the debate ends. And then the winner is declared as the side who was able to pull more votes to his side. But we also asked the audience another question, which side better enriched your knowledge during the debates and consistently in almost all the debates that we had with humans, the system was receiving significantly higher scores than humans? Think it was not very surprising, but still, it was reassuring and interesting to observe.
Pakinam Amer: Debate is very complex.
Debating involves arguments, and counter arguments … cross-references and analogy … the ability to engage with confidence in a dialogue, to judge the quality of a piece of information, and whether or not it will further one’s cause, and finally, to leave an impression on an audience. Enough to sway them to your side.
Debate is also more than the sum of its parts.
And that’s what I asked Noam about next.
How do you train a computational system to engage meaningfully with a human?
Noam Slonim: First of all, at the high level, the system has two major sources of information. One of them is a massive collection of around 400 million newspaper articles. And when the debate starts, the system is a trying to find — using different AI components, the system is trying to find — short pieces of text that satisfy three criteria: they should be relevant to the topic, they should be argumentative in nature, they should argue something about the topic, and they should support our side of the debate.
And once it finds these short pieces of text, the system is trying to use other capabilities in order to glue them together into a compelling narrative. So this is one major source of information for the system. The other major source of information for the system is a collection of more principled arguments that tries to capture the commonalities between the many different debates that humans are having.
So these are arguments written by experts, we had thousands of such more principled argumentative elements. And when the debate starts, the system is looking for the most relevant principled arguments in this collection, in order to use them in the right timing. So just to give an example, what we mean by a principled argument.
So if you are debating whether or not to ban the sale of alcohol, or whether or not to ban organ trade, in both cases, the opposition may argue that if you ban something, you can, at the risk of the emergence of a black market, that by itself has a lot of negative implications. So the black market argument is a principled one. It can be used, similarly in many different contexts. And this is what the system is trying to do with this source of information.
But by the way, people may naively assume that this is just a keyword matching thing, that if you ban something, we should anticipate the opposition to use the black market arguments. But obviously, this is not always the case. So sometimes we give the example of a debate about whether or not we should ban the use of internet cookies. So probably, we're not going to see a black market of people selling internet cookies in the street corners or something like that. So it is more subtle than that. So in this aspect as well, the system needs to develop a more nuanced understanding of the human language in order to perform well. So this is the second major source of information.
And finally, of course, there is a button, which is the most challenging part, we need somehow to respond to the arguments raised by the opposition. And this starts by understanding the walls articulated by the human opponent. And for that we simply use Watson speech recognition capabilities out of the box. But of course, we need to go beyond the walls, we need somehow to understand the gist of the arguments of the opposition.
And for that we used an arsenal of techniques. Most of them will rely on the same principle of trying to anticipate in advance what the opposition might argue, and then listen to determine whether indeed the opposition will making these arguments and then respond accordingly.
Pakinam Amer: You may have noticed that Noam refers to Watson—that’s not a reference to IBM’s founder Thomas J. Watson, but a supercomputer … a predecessor of Project Debater, which, OK, was named after Thomas J. Watson. (You win!)
Watson — the computer not the industrialist — made its debut on “Jeopardy! The IBM challenge” in 2011.
And it killed it, scorching the human competition, and wrapping up with a score of over $77,000.
The AI was also featured in a collaborative series between Bloomberg and Intelligence Squared U.S., paid for by IBM. This series was the first time that the AI really flexed its power to analyze the same issues that are typically taken on by human debaters using crowd-sourced information.
At one point, IBM Watson sifted through over 5,000 submissions from the public, all written in natural human language. Then analyzed the arguments.
It filtered out irrelevant data, and clustered the rest into “for” and “against'' categories. It was then able to weave a narrative that played to the strength of each side of the argument.
Arguably, not a bad tool for policymakers.
Perhaps Project Debater can weigh in on the merit of its big brother one day — without bias, of course.
Speaking of bias … The past few years have proven that algorithms are as biased as the people who created them. And what is Project Debater if not a bundle of algorithms?
It’s also a system without sentiments, an inherent moral compass or a sense of right or wrong. It’s as neutral and fair as the information it siphons into a live debate.
So how do we guarantee it’s not pulling from the wrong sources — sources that are biased or malicious?
Noam Slonim: So, if the data is biased, the system might be sensitive to that … If you're considering a particular controversy, … and you end up with, you know, one thousand arguments that support the motion and only five arguments that contest the motion, you immediately understand that there is bias in the data that you're considering in favor of o ne side versus the other. Now, whether this bias is justified or not, this is a different question. But this is at least a way to quantify and understand that the bias exists.
Pakinam Amer: In other words, while Project Debater cannot remove bias from the sources, it can somehow recognize it. And so far, it’s not pulling from the entire Internet, but from a verified library of resources that includes scholarly journals, and credible news.
While Project Debater is indeed clever — and a feat of computer engineering — it’s still missing the very obvious human elements that can make or break some debates. The tone and cadence of a person’s voice, charisma, passionate belief in what a person is arguing for or against. Debater is, meanwhile, faceless … a screen.
Noam Slonim: You are right, you know. Even if we go back to the ancient Greeks, and Aristotle, he was thinking about rhetorics, and he defined three fundamental pillars for rhetorics, we had the logos and ethos and pathos. And the system that we developed is more focused on logos. That said, I think it has elements also of the other pillars.
So first of all, I think it's an interesting question, you know, to what extent an artificial intelligence system has an ethos? I think it does. Because when the audience listens to our machine, quoting numbers and specific facts that are related to the debate, the audience understands that this is a machine. And, I believe the data being presented by the machine is as reliable.
In addition, regarding pathos, this is, again, another highly important element during the debate. And obviously, humans are much better with that. But still, we invested in these aspects as well. And we will try to make the voice of the system to be more expressive, not too much, but still more expressive for the purposes of the debate.
So we were trying to some extent to consider all three aspects. But indeed, we're focused more on logos, and this may explain some of the results of the debate. But there are teams not in academia, that are actually considering the other aspects. So this is a very active field of research.
Pakinam Amer: The team that brought Project Debater to life didn’t just involve computer scientists and technologists. According to Noam, it was at the intersection of many disciplines. At one point, they even had an author, a philosophy student and a world champion debater on the team.
I couldn’t help wondering if some day Project Debater will find its way into our social media, as a mediator or as misinformation police.
Noam Slonim: Over the last year, we started to consider more, I would say interactive forms of dialogue systems that benefit from the notions and the assets and the technologies that we developed as part of project debater. We demonstrated the system in a debate, which is interactive, but has a very clear structure. And in a free dialogue, this is a different scenario that we have started to explore with collaborators in academia, how can we take these notions into a more free dialogue style system?
It’s very interesting to observe the difference between this [free dialogue] and a debate, because in a debate, you really try to defeat the opponent ... And here, I believe this is a different situation, right? It has a lot of implications, because if you just keep shooting evidence at the other side, and this is intended to prove to the other side that they’re wrong, this will not be that beneficial, chances are that you will just cause the other side to be more protective.
So we are looking at something that will require new capabilities. It is also about listening somehow to reflect to the other side their concerns. So hopefully, in the coming months we will have some interesting results to share.
Pakinam Amer: I dare say that even Project Debater can’t argue against that. Noam says that this is their next challenge: making the AI system work in a free, interactive dialogue. He promises that he and his team will have more to share in the coming months.
For now, you’ve heard from Noam Slonim from IBM Research Labs in Haifa.
That was Science Talk, and this is your host Pakinam Amer. Thank you for listening.