Nearly all known life builds proteins from the same alphabet of 20 canonical amino acids. Strung together in different orders, those building blocks form the proteins that make cells work. In a new Science study, researchers at Columbia University, the Massachusetts Institute of Technology and Harvard University used artificial-intelligence-guided protein design to test how much of that alphabet can be pared back: they engineered an Escherichia coli strain that survived after it was redesigned to not have a specific amino acid in its ribosomal proteins.
The team did not create a true 19-amino-acid organism. The engineered strain still uses the targeted amino acid, isoleucine, throughout most of its genome. But the result suggests that one of life’s most ancient and essential machines can tolerate at least partial simplification—and that AI may help biologists test the limits of life’s chemistry.
“The underlying question that we seek to ask is what early life looks like,” says Harris H. Wang, a professor of systems biology at the Columbia University Irving Medical Center and senior author of the study. Researchers think all life today descends from an ancient, single-celled organism that lived more than four billion years ago. But some suspect that earlier, simpler life-forms that predate even this common ancestor may have run on a leaner chemistry. Wang’s team wanted to find out whether modern cells could be engineered in that direction.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
“Think about language. There are 26 letters in the English alphabet, but do you really need 26, or can you simplify that to 25 or 24?” Wang says. The team chose to remove isoleucine because it resembles the amino acids valine and leucine closely enough that, in principle, some proteins might tolerate isoleucine’s removal when it was replaced with one of them. They worked with E. coli, one of biology’s best-studied organisms, and targeted its ribosomes, the molecular machinery that builds proteins and is itself a sprawling complex of more than 50 proteins. “Like in a video game, we just pushed the ‘skip to the final boss’ button,” Wang says.
The first attempt was brute force. The researchers took 39 essential or highly expressed E. coli genes and replaced every isoleucine with valine or leucine, like a genetic find-and-replace. The engineered bacteria survived but did so poorly. Their fitness dropped to about 40 percent of wild-type E. coli. The team’s target was 90 percent. To close the gap, the researchers turned to AI.
They combined two kinds of models. First, sequence-based protein language models such as ESM2 and MSA Transformer read protein sequences and suggested evolutionarily plausible mutations that a simple swap would miss. Then structure-based AI models such as AlphaFold2 and ProteinMPNN checked that the redesigned proteins would fold into the correct shapes and fit alongside neighboring molecules.
The proposals were stranger than the team expected. “Some of these AI designs were really surprising,” Wang says. “They didn’t look like anything we would have anticipated.” In one case, while redesigning a ribosomal protein called RpsJ, the AI remodeled an alpha helix—a structural element bridging different parts of the ribosome—and introduced eight new nearby mutations to compensate for the substitution of just two isoleucines. “Maybe these machine-learning systems know some aspects of biology we can experimentally verify but we don’t yet understand,” Wang says.
“A noteworthy part of the project is the evolving contribution of AI to this work,” says Tom Ellis, a professor of synthetic genome engineering at Imperial College London, who was not involved in the study. “In the last seven years, the AI-enabled modeling of proteins and mutations in proteins has come on leaps and bounds.”
The team first tested each AI-suggested change one at a time, confirming individual edits could meet the 90 percent fitness goal. Combined, the changes killed the cells. So the researchers debugged the genome by hand. Starting fresh from the natural E. coli sequence, they added the AI-designed pieces in small batches until the cells stopped growing, narrowing down the lethal interaction to a single region so they could fix it.
The final strain, Ec19, carries 21 isoleucine-free ribosomal proteins out of 52, alongside AI-redesigned versions of the others that the team validated individually but could not yet combine. The strain is robust: fitness stays above 90 percent of wild-type E. coli, and natural selection did not revert the changes over 450 generations.
“The paper is a tour de force of synthetic biology to address a really interesting question that’s fundamental to the origin of life on Earth,” Ellis says. He adds that this work could eventually inform biotechnology beyond Earth, in environments where not every amino acid is available.
For now, Ec19 remains a 20-amino-acid organism. Wang and his colleagues purged 382 isoleucine residues from ribosomal proteins, but the rest of its genome still contains more than 81,000 isoleucine residues across thousands of other proteins. A truly 19-amino-acid organism will require cheaper, faster DNA synthesis and more capable AI models, including genomic language models trained on whole genomes rather than just proteins.
Still, showing that ribosomal proteins can survive even partial simplification gives researchers a template for the rest of E. coli. “Considering the ribosome is probably the oldest remnant of the original common ancestor organism that first evolved protein synthesis, it’s also a poetic thing to demonstrate this ambitious work on,” Ellis says.

