During a 2017 casino tournament, a poker-playing program called Libratus deftly defeated four professional players in 120,000 hands of two-player poker. But the program’s co-creator, Tuomas Sandholm, did not believe artificial intelligence could achieve a similar performance against a greater number of players.
Two years later, he has proved himself wrong. Sandholm has co-created an AI program called Pluribus, which can consistently defeat human experts in six-player matches of no-limit Texas Hold’em poker. “I never would have imagined we would reach this in my lifetime,” says Sandholm, a professor of computer science at Carnegie Mellon University.
Past AI victories over humans have involved two-player or two-team games such as checkers, chess, Go and two-player no-limit poker. All of these games are zero-sum—they have just one winning side and one losing side. But six-player poker comes much closer to resembling real-life situations in which one party must make decisions without knowing about multiple opponents’ decision-making processes and resources. “This is the first major benchmark that is not two-player or two-team zero-sum games,” says Noam Brown, a research scientist at Facebook AI Research and co-creator of Pluribus. “For the first time, we’re going beyond that paradigm and showing AI can do well even in a general setting.”
The Pluribus program first proved its worth by playing profitably in six-player games that pitted just one human against five independent versions of Pluribus. It went on to win money in matches with five human players (taken from a rotating cast of 15 poker professionals who have each won at least $1 million in tournaments) versus one AI over 10,000 hands of poker and 12 days of games. These successes are detailed in a paper published this week in Science. Although Pluribus did not reach a win rate quite as high as Libratus or another two-player poker program called DeepStack, it still notched a very respectable win rate. “When the bot was sitting down with humans, it was making a lot of money,” Brown says. “I would certainly characterize that as a superhuman performance.”
“Though there was already evidence that the techniques that conquered two-player poker worked pretty well in three-player environments, it was not clear they would suffice to reach the highest professional level of play,” says Michael Wellman, a professor of computer science and engineering at the University of Michigan, who was not involved in the study. “It is really news that this worked so effectively for six-player poker. This is a pretty big deal—certainly a notable milestone.”
To reach this level, Pluribus—like its predecessor Libratus—first played against itself over many simulated hands of poker, developing a strategy blueprint. The big breakthrough that let it tackle six-player poker came from its “depth-limited search feature.” That component allows the AI to look ahead several moves and figure out a better strategy for the rest of the game, based on possible opponent decisions. Many other poker-playing programs have used similar search features, but doing so with six players would require an impractical amount of computing memory: there are too many scenarios to simulate, based on what cards each player holds, what each believes the other players to have and all the betting decisions that follow. Libratus got around this bottleneck by only using searches in the final two (out of four) betting rounds—but that solution still required the use of 100 central processing units (CPUs) in a game with only two players.
So Pluribus instead deployed its depth-limited search. With this technique, the AI first considers how it and its opponents might play for the next few moves. Beyond that point, it simplifies its model by restricting each simulated player’s choices to only four strategies: the precomputed blueprint, one biased toward folding, another biased toward calling and a fourth biased toward raising.. This modified search helps explain why Pluribus’s success in six-player poker required relatively minimal computing resources and memory in comparison with past superhuman achievements in gaming AIs. Specifically, during live poker play, Pluribus ran on a machine with just two central CPUs and 128 gigabytes of memory. “It’s amazing this can be done at all, and second, that it can be done with no [graphics processing units] and no extreme hardware,” Sandholm says. By comparison, DeepMind’s famous AlphaGo program used 1,920 CPUs and 280 GPUs during its 2016 matches against top professional Go player Lee Sedol.
Carnegie Mellon University and Facebook plan to make the Pluribus pseudo code—a detailed explanation of each necessary step in the program—available alongside the published paper, so that other AI researchers can generally reproduce their efforts. But the team decided not to release the actual code; this would likely facilitate the spread of superhuman poker-playing programs, which could be extremely disruptive to the online poker community and industry. Even without the code, though, humans can start learning from the AI’s strategies. For example, professional poker players usually consider it a mistake to make a “donk bet”—starting a round by betting aggressively after having ended the previous round by nonaggressively matching an existing bet. But Pluribus ended up using this technique much more frequently.
Beyond poker, this AI could potentially find applications in any situation when a person must make decisions without complete knowledge of what other parties might be thinking or doing. Such areas could include cybersecurity, financial trading, business negotiations and competitive price setting. Sandholm says the AI could even help in the party primaries for the 2020 U.S. presidential election: candidates competing in a packed field could theoretically benefit from AI suggestions on spending just enough advertising money to win in key states, making the most of a limited war chest. Sandholm has founded three start-ups, including the companies Strategic Machine and Strategy Robot, that might incorporate this multiplayer AI into the services they offer to business and military clients.
For its part, Facebook does not have immediate plans for exploiting the poker-specific Pluribus. But Brown plans to further explore how AI performs in more complex multiplayer scenarios that go beyond card games. “We’re going to close the books on poker now, because this was the final milestone,” Brown says. “Now we’re looking to extend this beyond poker.”