Scientists make AI play Battleship to help it do science better

AI models and people played “collaborative” Battleship to test strategies for efficiently solving problems

By Peter Hall edited by Sarah Lewin Frasier

An illustration of a boat game piece with pegs in it, the last of which is attached as a finger to a hand covered in the same grid pattern. — Thomas Fuchs

Join Our Community of Science Lovers!

If artificial intelligence is going to revolutionize the way science is done, as many of the frontier AI laboratories hope, it needs to master board games first. That’s the lesson from a recent study of AI models’ decision-making skills, tested with the game Battleship. The goal was to find ways for models to be more careful with limited resources: “cheap interventions” for information seeking, as research scientist Valerio Pepe puts it.

Science requires lots of decisions—researchers must choose which hypotheses to pursue and which simulations to run. The choices will determine which path to follow when resources for experiments are limited. “You can get only so much data because getting data is either expensive or time-consuming,” says Pepe, who led work on the project before joining OpenAI. (The research was conducted at the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory.)* In April, Pepe and his colleagues presented their findings at the International Conference on Learning Representations, an annual meeting dedicated to AI deep learning.

The researchers designed a collaborative version of Battleship that could be played by humans or AI. In the game, one team member generated questions about the map of ships’ locations while another answered them, in a combined effort to pinpoint where the vessels were hidden and sink them. By counting how many rounds it took to sink all the ships, the researchers could test how large language models (LLMs) performed compared with other LLMs and with the 42 human players the group had enlisted. Initially, humans consistently won in fewer moves than Llama-4-Scout, Meta’s efficiency-focused AI model. OpenAI’s premier reasoning model, GPT-5, performed better than both.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The scientists were inspired by Bayesian experimental design, in which researchers interpret decision-making by estimating the likelihoods of events given prior assumptions. They optimized their models to ask questions that maximized the chances of hitting targets accurately and the amount of information they gained with each question, as well as to look ahead a turn when deciding which move to make. The scientists also found that accuracy increased when the players communicated with snippets of code rather than natural language. Through this process, the group led Llama-4-Scout to win in fewer moves than GPT-5 two thirds of the time at about one hundredth of the cost. On average, it also won in seven fewer moves than the human players.

Battleship is much simpler than many problems in science—chemical and biological samples, for instance, can’t be interpreted as clearly as Battleship boards. But Pepe says the methods AI used in the game will probably also be applicable to scientific decision-making.

“The framework will be very useful to measure whether language models are really making progress” in deciding which hypotheses to pursue among all possibilities, says Yuanqi Du, a researcher focused on AI for chemistry who recently completed his Ph.D. at Cornell University and was not involved in the study. “Understanding the whole hypothesis space you’re searching, that’s the hardest part.”

*Editor’s Note (5/15/26): This sentence was added after posting to clarify where the study was conducted.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American