Never heard of the Journal of International Relief or the International Humanitarian Digital Repository? That’s because they don’t exist.
But that’s not stopping some of the world’s most popular artificial intelligence models from sending users looking for records such as these, according to a new International Committee of the Red Cross (ICRC) statement.
OpenAI’s ChatGPT, Google’s Gemini, Microsoft’s Copilot and other models are befuddling students, researchers and archivists by generating “incorrect or fabricated archival references,” according to the ICRC, which runs some of the world’s most used research archives. (Scientific American has asked the owners of those AI models to comment.)
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
AI models not only point some users to false sources but also cause problems for researchers and librarians, who end up wasting their time looking for requested nonexistent records, says Library of Virginia chief of researcher engagement Sarah Falls. Her library estimates that 15 percent of emailed reference questions it receives are now ChatGPT-generated, and some include hallucinated citations for both published works and unique primary source documents. “For our staff, it is much harder to prove that a unique record doesn’t exist,” she says.
This is not the first time AI has been caught making up false citations. The ICRC recommends that people consult online catalogs or references in existing published scholarly works to find references to real studies instead of assuming anything cited by an AI is real, no matter how authoritative it might sound. The Library of Virginia will be asking researchers to vet their sources for these requests, Falls says, and to disclose if a source originated from AI. “We’ll likely also be letting our users know that we must limit how much time we spend verifying information.”

