This musician built an AI clone of her voice so anyone can sing as her

Experimental composer Holly Herndon says this technology isn’t here to replace artists—and that the future of creativity belongs to collective intelligence

By Deni Ellis Béchard edited by Eric Sullivan

Holly Herndon standing indoors at Serpentine North Gallery in London, framed by a suspended circular sculptural structure, with brick walls in the background. — Holly Herndon at the Serpentine North Gallery in London in October 2024.

Matthew Chattle/Future Publishing via Getty Images

Holly Herndon hears the future of music in data. Herndon came to electronic music after singing in church and choirs in East Tennessee. She earned a master’s degree at Mills College and a doctorate at Stanford University’s Center for Computer Research in Music and Acoustics.

When she began experimenting with machine learning in 2015, the outputs sounded “scratchy,” but she recalls seeing “the diamond in the rough.” Today those experiments have evolved into custom models that allow anyone to perform as her.

Scientific American spoke to Herndon about training her AI models and her belief that creativity has always been collective—AI just makes it visible.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

[An edited transcript of the interview follows.]

You describe your work as “protocol art.” What does that mean?

In the 20th century, the site of media generation—the paper and pen where music was written—was the artistic act. With protocol art, the creative act happens upstream of media generation. It’s creating the rule set and conditions in which art is made.

We’re really interested in training our own models. I always say “we” because I work with my partner, Mat Dryhurst. We treat each step in the model-making process as a creative intervention moment. The making of the dataset is part of the artwork. I often write music for training—music not necessarily for human ears but for a computer to learn something.

Can you give me an example of what that looks like in practice?

We have an exhibition in Berlin right now. We were inspired by Hildegard von Bingen, a medieval composer. We wanted to pretend as if polyphony had existed when she was alive. We started with a model of her compositions and added rule sets so it could generate polyphony in her style. We took those outputs, rearranged them and gave them to human singers to interpret. Then we created a huge installation where performers sing and invite the public to train with us.

It’s not about putting in “write me a pop song with a guitar.” It’s about using this technology to bring humans together to make art in real space.

Most commercial AI models are trained on data scraped from the Internet. Why do you insist on building your own models?

As an electronic musician, I was never one to sample—I always made my own sound palettes. When we started, pre-Suno and pre-all-this-stuff, we had to make our own dataset. It just felt natural, like making my own samples or digital instruments.

One criticism of products [like Suno] is that they’re very “mid” sounding—trained on everything or the most average. My models sound unique because I’m making the training data myself. I also think there’s prompting under the hood in Suno limiting it to three-minute songs with verse-chorus structure. There are guardrails making it boring. I’d love for them to release some constraints.

Has a model ever surprised you?

We did a project called Holly+ around 2021—a voice clone of my particular voice. We worked with Voctro Labs to train a voice model that works in real time so people can sing using my voice. That was game-changing.

If this works in real time, other people can perform each other’s identity in real time. When we were testing it, my partner, who’s British, was singing into it. I heard my voice with a British accent. It was so uncanny, I had to leave the room—he was singing as me. That was one of the biggest mental unlocks of how weird and cool this stuff can get.

I think it’ll take five to 10 years to be seamless. But once we’re body morphing in real time—imagine you could create a model of a whale voice, then do a hybrid soprano whale. When you sing high, it goes operatic; when you sing low, you’re more whale or Barry White. We’re no longer tied to my larynx.

Where do you think we’ll be in 10 years?

A lot of fears around this technology are actually fears of how the current Internet works—the attention economy, how difficult it is as a creator. My partner always says, “Scrolling is for bots, and strolling is for humans.”

Our more optimistic vision is using agents to deal with all the crap and filter through stuff, actually bringing us together in the real world. That’s why our projects involve people meeting IRL and doing things together. Some of my smartest developer friends are vibe coding with multiple agents while cooking or hiking with their toddler. Things could be really beautiful if we imagine and build it that way.

Does this technology change your definition of creativity?

This whole AI thing might force us to see ourselves as maybe not the only creative actors in the universe. That needn’t be scary—it could be beautiful and liberating.

Creativity happens in swarm, in community. AI is just collective intelligence—aggregated human intelligence. The 20th-century art model is tied to an individual genius who touches an object and imbues it with value. That’s being thrown on its head. I’m all team collective intelligence.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American