Researchers say an AI-powered transcription tool used in hospitals discovers things no one has ever said
SAN FRANCISCO (AP) – Tech behemoth OpenAI claims its artificial intelligence-powered transcription tool Whisper is close to “human-level robustness and accuracy.”
But Whisper has a major flaw: It tends to produce chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. These experts said some of the invented texts — known in the industry as hallucinations — could include racial commentary, violent speech and even imaginary medical treatments.
Experts say such fabrications are problematic because Whisper is used in several industries worldwide to translate and transcribe interviews, generate text in popular consumer technology and create subtitles for videos.
They said, about more Overcrowding in medical centers Using Whisper-based tools to transcribe patients' consultations with physicians, notwithstanding OpenAI' s warning that the tool should not be used in “high risk domains”.
The full extent of the problem is difficult to quantify, but researchers and engineers say they often encounter Whisper's hallucinations in their work. A University of Michigan For example, a researcher conducting a study of public meetings said he found hallucinations in eight out of every 10 audio transcriptions before he began trying to improve the model.
A machine learning engineer says he initially discovered hallucinations in about half of the more than 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he made with Whisper.
Even with well-recorded, small audio samples, problems remain. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined.
This trend would lead to thousands of faulty transcriptions over millions of recordings, the researchers said.
___
This story was produced in partnership with the Pulitzer Center's AI Accountability Network, which supported the Academic Whisper study in part. AP also receives financial support from Omedia Network to support its coverage of artificial intelligence and its impact on society.
___
Such mistakes can have “really serious consequences,” especially in hospital settings, he said Alondra Nelsonwho until last year led the White House Office of Science and Technology Policy for the Biden administration.
“Nobody wants a misdiagnosis,” says Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. “Should be a high bar.”
Whisper is also used to create closed captions for the deaf and hard of hearing – populations at particular risk for faulty transcription. Because the deaf and hard-of-hearing have no way of identifying the fabrications “hidden in all these other texts,” says Christian Voglerwho directs the Technology Access Program for the Deaf and Gallaudet University.
OpenAI calls for problem solving
The prevalence of such hallucinations has prompted experts, advocates and former OpenAI employees to call on the federal government to consider AI regulation. At a minimum, they said, OpenAI needs to fix its flaws.
“If the company is willing to prioritize it, it seems solvable,” said William Saunders, a San Francisco-based research engineer who left OpenAI in February over concerns about the company's direction. “It's problematic if you put it out there and people are overconfident about what it can do and integrate it into all these other systems.”
A OpenAI The spokesperson said the company continually studies how to reduce hallucinations and appreciates researchers' findings, incorporating feedback into OpenAI model updates.
While most developers assume that transcription tools misspell words or make other errors, engineers and researchers say they've never seen an AI-powered transcription tool hallucinate like a whisper.
Whispering hallucinations
The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT and is a built-in offering on Oracle and Microsoft's cloud computing platforms, which serve thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages.
In the past month alone, a recent version of Whisper has been downloaded 4.2 million times from HuggingFace, an open-source AI platform. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and has been built into everything from call centers to voice assistants.
Professors Alison Koneke Cornell University and Sloane here The University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that about 40% of hallucinations are harmful or related because the speaker can be misinterpreted or misrepresented.
In one example they uncovered, a speaker said, “He, the boy was going, I'm not sure, take the umbrella.”
But the transcription software added: “He took a big piece of a cross, a small, small piece … I'm sure he didn't have a terrorist knife so he killed a lot of people.”
In another recording a speaker described “two other girls and a woman”. Whisper made additional comments about race, adding “two other girls and a woman, um, who was black.”
In the third transcription, Whisper invented a non-existent drug called “hyperactivated antibiotic”.
Researchers aren't sure why Whisper and similar tools hallucinate, but software developers say the hallucinations occur between pauses, background noise or music playing.
“In decision-making contexts, where errors in accuracy can cause pronounced errors in results,” OpenAI said in its online release.
Doctor appointment transcript
That caution hasn't stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what's said during doctor visits so that medical providers spend less time taking notes or writing reports.
More than 30,000 physicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital in Los Angeles, have begun using a Whisper-based tool. nablaIt has offices in France and the United States
The tool was finely tuned to medical language to transcribe and summarize patient interactions, said Martin Raison, Nabla's chief technology officer.
Company officials said they are aware that Whisper can hallucinate and are addressing the issue.
It's impossible to compare Nabbler's AI-generated transcripts with the original recordings because Nabbler's tool deletes the original audio “for data security reasons,” Raison said.
Nabla said the tool has been used to transcribe an estimated 7 million medical visits.
Deletion of the original audio can be worrisome if transcripts aren't double-checked or if clinicians can't access recordings to verify they're accurate, said Saunders, a former OpenAI engineer.
He said, “You cannot catch a mistake if you take away the ground truth.
Nabla said neither model is perfect, and they currently require medical providers to quickly edit and approve transcribed notes, but that could change.
Privacy concerns
Because patient meetings with their doctors are confidential, it's hard to know how AI-generated transcripts are affecting them.
A California state legislator, Rebecca Bauer-Kahansaid she took one of her children to the doctor earlier this year and refused to sign a form the health network asked for permission to share audio of the consultation with its vendors, which included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahn didn't want such intimate medical conversations to be shared with tech companies, she said.
“The release was very specific that for-profit companies would have the right to have it,” said Bauer-Kahn, a Democrat who represents part of suburban San Francisco in the state Assembly. “I was like 'absolutely not'.”
John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.
___
Shellman reports from New York.
___
AP is solely responsible for all content. Find AP's value To work with philanthropists, a list of supporters and funded coverage areas AP.org.
___
There is the Associated Press and OpenAI Licensing and Technology Agreements Allows OpenAI access to portions of AP's text archive.