Holly Herndon hears the future of music in data. Herndon came to electronic music after singing in church and choirs in East Tennessee. She earned a master’s degree at Mills College and a doctorate at Stanford University’s Center for Computer Research in Music and Acoustics.
When she began experimenting with machine learning in 2015, the outputs sounded “scratchy,” but she recalls seeing “the diamond in the rough.” Today those experiments have evolved into custom models that allow anyone to perform as her.
Scientific American spoke to Herndon about training her AI models and her belief that creativity has always been collective—AI just makes it visible.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
[An edited transcript of the interview follows.]
You describe your work as “protocol art.” What does that mean?
In the 20th century, the site of media generation—the paper and pen where music was written—was the artistic act. With protocol art, the creative act happens upstream of media generation. It’s creating the rule set and conditions in which art is made.
We’re really interested in training our own models. I always say “we” because I work with my partner, Mat Dryhurst. We treat each step in the model-making process as a creative intervention moment. The making of the dataset is part of the artwork. I often write music for training—music not necessarily for human ears but for a computer to learn something.
Can you give me an example of what that looks like in practice?
We have an exhibition in Berlin right now. We were inspired by Hildegard von Bingen, a medieval composer. We wanted to pretend as if polyphony had existed when she was alive. We started with a model of her compositions and added rule sets so it could generate polyphony in her style. We took those outputs, rearranged them and gave them to human singers to interpret. Then we created a huge installation where performers sing and invite the public to train with us.
It’s not about putting in “write me a pop song with a guitar.” It’s about using this technology to bring humans together to make art in real space.
Most commercial AI models are trained on data scraped from the Internet. Why do you insist on building your own models?
As an electronic musician, I was never one to sample—I always made my own sound palettes. When we started, pre-Suno and pre-all-this-stuff, we had to make our own dataset. It just felt natural, like making my own samples or digital instruments.
One criticism of products [like Suno] is that they’re very “mid” sounding—trained on everything or the most average. My models sound unique because I’m making the training data myself. I also think there’s prompting under the hood in Suno limiting it to three-minute songs with verse-chorus structure. There are guardrails making it boring. I’d love for them to release some constraints.
Has a model ever surprised you?
We did a project called Holly+ around 2021—a voice clone of my particular voice. We worked with Voctro Labs to train a voice model that works in real time so people can sing using my voice. That was game-changing.
If this works in real time, other people can perform each other’s identity in real time. When we were testing it, my partner, who’s British, was singing into it. I heard my voice with a British accent. It was so uncanny, I had to leave the room—he was singing as me. That was one of the biggest mental unlocks of how weird and cool this stuff can get.
I think it’ll take five to 10 years to be seamless. But once we’re body morphing in real time—imagine you could create a model of a whale voice, then do a hybrid soprano whale. When you sing high, it goes operatic; when you sing low, you’re more whale or Barry White. We’re no longer tied to my larynx.
Where do you think we’ll be in 10 years?
A lot of fears around this technology are actually fears of how the current Internet works—the attention economy, how difficult it is as a creator. My partner always says, “Scrolling is for bots, and strolling is for humans.”
Our more optimistic vision is using agents to deal with all the crap and filter through stuff, actually bringing us together in the real world. That’s why our projects involve people meeting IRL and doing things together. Some of my smartest developer friends are vibe coding with multiple agents while cooking or hiking with their toddler. Things could be really beautiful if we imagine and build it that way.
Does this technology change your definition of creativity?
This whole AI thing might force us to see ourselves as maybe not the only creative actors in the universe. That needn’t be scary—it could be beautiful and liberating.
Creativity happens in swarm, in community. AI is just collective intelligence—aggregated human intelligence. The 20th-century art model is tied to an individual genius who touches an object and imbues it with value. That’s being thrown on its head. I’m all team collective intelligence.

