This researcher made up a disease to test AI. It failed miserably

How an experiment involving a made-up skin condition exposes the risks of increasingly popular AI medical advice

Conceptual image of artificial intelligence being used for medical advice

J Studios/GettyImages

Illustration of a Bohr atom model spinning around the words Science Quickly with various science and medicine related icons around the text

Rachel Feltman: For Scientific American’s Science Quickly, I’m Rachel Feltman.

Have your eyes ever felt sore and itchy after spending too much time staring at a screen? You might have a condition known as bixonimania—or at least that’s what several popular AI-powered chatbots might have told you if you’d asked last year.

Millions of people around the world turn to AI chatbots for medical advice every day, often as a supplement to a doctor’s visit but also sometimes in place of it. That can lead to dangerous consequences and in rare cases, even death.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Our guest today is Almira Osmanovic Thunström. She’s a researcher at the University of Gothenburg in Sweden and at the Sahlgrenska University Hospital, Center for Digital Health and Chalmers Industriteknik. She’s also the creator of bixonimania. She says this totally made-up disease reveals some very real problems with the way we train and use large language models.

Feltman: Thank you so much for coming on to chat with us today.

Almira Osmanovic Thunström: Thank you so much for inviting me.

Feltman: So you recently did an interesting project involving AI. Can you tell us a little bit about how you came to this idea?

Osmanovic Thunström: I work many different jobs, but one of them is in academia. I was having lectures for students and telling students how systems that create large language models work and demonstrating where the data comes from. And it was interesting how few of them, or how few even people within AI, understand how large language models are built.

So I really wanted to have a clear case that leaves breadcrumbs throughout the whole system to show both how data is processed, how data is churned out and how the prediction model and training model works when it comes to distributing information. And most of my students are in medicine, so they’re either medical students or psychologists or working with health. So it was quite easy to use that as a target for creating this project where I show you go from just a loose [Laughs], a loose mention of a condition to it being a full-blown disease in the large language models.

Feltman: So walk us through the process here.

Osmanovic Thunström: Well, to start off with, I knew that most of data that these commercial large language models—and, quite clearly, all language models, even the noncommercial ones—are built on is Common Crawl. It is a nonprofit organization that crawls the Internet for written and digitized information and has done so since 2007. And this large repository is what is used to create the algorithm that—and the reasoning behind what information is fed into, for example, ChatGPT. And that is where it starts.

So knowing that anything that goes in there will come out as information, and humans are in the loop and sift out data, but those humans are not always able to sift out data, especially if it looks credible ...

Feltman: Mm.

Osmanovic Thunström: So creating something that looks credible enough for an AI and credible enough for a human eye that wouldn’t care to look deeply into it, I knew that I had to create, to start off with, a fake university. Universities are highly ranked as sources of information. I knew I had to create a researcher because humans and not companies [Laughs] are more valued as information sources, especially if [they] belong to a credible institution.

But I also know that sprinkling little words in, for example, blogs or social media is also picked up ’cause those are open sources being crawled. So I knew that I had to sort of put the word out there in several different sources for it to seem credible for the AI system.

Feltman: Yeah, and did anything surprise you about how this played out, or, or did it all proceed as you had expected it to?

Osmanovic Thunström: In a sense, yes, ’cause I didn’t think that preprints, which are academia’s sort of tabloids [Laughs] ’cause anything can end up there, would be weighed into the database as seriously as it was in the context of what kind of information is used for training medical information.

So I thought that this preprint would not make it into large language models. I was convinced that perhaps the word “bixonimania” would probably show up at some point due to the blogs but not even that. It’s too few mentions, and I didn’t do a lot of effort, like, a mass campaign or anything like that. I just sprinkled a tiny, little bit just to see if it works.

And I noticed immediately that even the blogs were picked up [Laughs] and the preprints were picked up, and I did not actually expect that. I thought it would be a case of showing that there is a human—that there is some form of filter. But it surprised me that there wasn’t.

Feltman: So could you tell us how the large language models were using this information? What sort of questions were you asking, and what were you getting back from them?

Osmanovic Thunström: In the beginning I was just checking, if I mentioned the symptoms, if it would give me back that as a suggestion. And of course, it didn’t, it didn’t think of that as the first thing. So if you describe, “Yeah, I have red eyelids, pink-hued eyelids. What could it be?” and then it would go through conjunctivitis. It would go through allergies. It would sort of rank things ...

Feltman: Mm-hmm.

Osmanovic Thunström: That could be possible. And when it ended up sort of, “No, it’s not. I’m not in pain. I’m not this.” “Oh, have you been spending time in front of a screen?” “Yeah, I’ve been spending lots of time, and I’ve been thinking about getting blue-light glasses.” “Oh, you’re exposed to a lot of blue light. Well,” and then it would put a lot of other conditions, like in—hyperpigmentation, and then eventually end up in bixonimania.

So it wasn’t, thankfully, the first thing it suggested, but it does eventually, when it rules out everything else.

Feltman: Well, and you mentioned that you expected to see signs that there was some human influence here. So could you tell our listeners what clues did you leave that this was not a real condition, that these, you know, preprints were not serious papers?

Osmanovic Thunström: I’m laughing already because it was quite clear. Like, they belong to a nonexistent university in a nonexistent city. That in itself can be something that can be missed ’cause there are a lot of universities out there. [Laughs.] But the names were quite cartoonish. The main author, Lazljiv Izgubljenovic, if you put his name in Google Translate, literally says “the Lying Loser.” And the title says [something like] “Hyperpigmentation: A Real BS Design.”

So it’s really the title, the, the [Laughs] people says that, and then you move into the methods, and it says [something like], “This entire paper is made up. These 50 made-up individuals, who do not exist, have been through this procedure.” So just by those two clues, you should stop reading or taking it seriously.

And then if you go further, because I was thinking, “Maybe it just passes by. Let’s put in acknowledgements and funding,” and [the papers say they’re] funded by the Galactic Triad and Lord of the Rings. We thank our fellow colleagues on the Starship Enterprise [Laughs] for using their lab. I thank Professor Ross Geller for his time and the funding from Sideshow Bob Foundation.

There were so many incredibly clear clues that I thought would catch the human eye, at least.

Feltman: But the paper did end up getting cited by other researchers, is that right?

Osmanovic Thunström: Yes, it ended up being not only cited, but bixonimania became cited inside the paper as an emerging periorbital pigmentation condition with its name. So of course, that enhanced the large language models’ sort of notion of what is real with this condition and what is not ’cause now it sort of ranked even higher because there was a peer-reviewed journal mentioning the name and the reference. So it sort of heightened the large language models’ abilities to sort of see it as a real condition.

Feltman: So what do you think we should be taking away from this? You know, obviously, this is, you know, a very artificially constructed scenario, but what do you think the lessons we should learn here are?

Osmanovic Thunström: I think it’s that we should be more careful when using commercial large language models for health information ’cause they are easy to infiltrate in so many ways [Laughs], as proven by this, and not just by the way AI today works—with turnover or new models coming out quickly, a lot of information being processed at the same time, it being connected to the Internet as well and taking real-time information—but also that humans have stopped being critical towards the sources they consume.

So recently, I’ve seen that there have been a lot of reports of fake references, there being exponentially more of them in academic papers, which indicates that we have been becoming more reliant on AI as a tool for academia without actually reading [Laughs] and, and looking at sources. And I’m laughing because I’m just thinking about the fact that this paper probably has been cited in other papers but has been stopped by reviewers, hopefully, when it showed up and someone has seen that, “Oh, this sounds like a condition that doesn’t exist.” So we cannot know if that’s happened, but I’m guessing and hoping that that happens. So we need more humans in the loop when it comes to AI and medical information.

I think also, like, we did our part in trying to make this as ethical as possible, talking to physicians, talking to patients, talking to everyone who could possibly be of use to making this as nondamaging as possible in its—both its construct and its delivery. But there are forces out there who might be using this [Laughs], this way of infiltrating information into large language model for malicious things, in both academia and outside of it. So I would really hope that we start caring more also about the ethics of how we distribute, use and manipulate information in the digitized world.

Feltman: That’s all for today. We’ll be skipping Monday’s news roundup so the team can enjoy the holiday weekend. Tune in next Wednesday for a conversation about the concept of ecocivilization—a world where human systems are built with the collective good of the entire planet in mind.

Science Quickly is produced by me, Rachel Feltman, along with Fonda Mwangi, Sushmita Pathak and Jeff DelViscio. This episode was edited by Alex Sugiura. Shayna Posses and Aaron Shattuck fact-check our show. Our theme music was composed by Dominic Smith. Subscribe to Scientific American for more up-to-date and in-depth science news.

For Scientific American, this is Rachel Feltman. Have a great weekend!

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American

Subscribe