Artificial Intelligence Hallucination — Text from the ChatGPT page of the OpenAI website is shown in this photo, in New York. Photo: AP file

tech

Chatbots sometimes make things up. Not everyone thinks AI's hallucination problem is fixable

Aug. 3 06:00 am JST 9 Comments

By MATT O'BRIEN

NEW YORK

Spend enough time with ChatGPT and other artificial intelligence chatbots and it doesn't take long for them to spout falsehoods.

Described as hallucination, confabulation or just plain making things up, it's now a problem for every business, organization and high school student trying to get a generative AI system to compose documents and get work done. Some are using it on tasks with the potential for high-stakes consequences, from psychotherapy to researching and writing legal briefs.

"I don’t think that there’s any model today that that doesn’t suffer from some hallucination,” said Daniela Amodei, co-founder and president of Anthropic, maker of the chatbot Claude 2.

"They’re really just sort of designed to predict the next word," Amodei said. "And so there will be some rate at which the model does that inaccurately.”

Anthropic, ChatGPT-maker OpenAI and other major developers of AI systems known as large language models say they're working to make them more truthful.

How long that will take — and whether they will ever be good enough to, say, safely dole out medical advice — remains to be seen.

“This isn’t fixable,” said Emily Bender, a linguistics professor and director of the University of Washington’s Computational Linguistics Laboratory. “It’s inherent in the mismatch between the technology and the proposed use cases.”

A lot is riding on the reliability of generative AI technology. The McKinsey Global Institute projects it will add the equivalent of $2.6 trillion to $4.4 trillion to the global economy. Chatbots are only one part of that frenzy, which also includes technology that can generate new images, video, music and computer code. Nearly all of the tools include some language component.

Google is already pitching a news-writing AI product to news organizations, for which accuracy is paramount. The Associated Press is also exploring use of the technology as part of a partnership with OpenAI, which is paying to use part of AP's text archive to improve its AI systems.

In partnership with India's hotel management institutes, computer scientist Ganesh Bagler has been working for years to get AI systems, including a ChatGPT precursor, to invent recipes for South Asian cuisines, such as novel versions of rice-based biryani. A single “hallucinated” ingredient could be the difference between a tasty and inedible meal.

When Sam Altman, the CEO of OpenAI, visited India in June, the professor at the Indraprastha Institute of Information Technology Delhi had some pointed questions.

“I guess hallucinations in ChatGPT are still acceptable, but when a recipe comes out hallucinating, it becomes a serious problem,” Bagler said, standing up in a crowded campus auditorium to address Altman on the New Delhi stop of the U.S. tech executive's world tour.

“What's your take on it?” Bagler eventually asked.

Altman expressed optimism, if not an outright commitment.

“I think we will get the hallucination problem to a much, much better place," Altman said. “I think it will take us a year and a half, two years. Something like that. But at that point we won’t still talk about these. There’s a balance between creativity and perfect accuracy, and the model will need to learn when you want one or the other.”

But for some experts who have studied the technology, such as University of Washington linguist Bender, those improvements won't be enough.

Bender describes a language model as a system for “modeling the likelihood of different strings of word forms,” given some written data it's been trained upon.

It's how spell checkers are able to detect when you've typed the wrong word. It also helps power automatic translation and transcription services, “smoothing the output to look more like typical text in the target language,” Bender said. Many people rely on a version of this technology whenever they use the "autocomplete" feature when composing text messages or emails.

The latest crop of chatbots such as ChatGPT, Claude 2 or Google's Bard try to take that to the next level, by generating entire new passages of text, but Bender said they're still just repeatedly selecting the most plausible next word in a string.

When used to generate text, language models "are designed to make things up. That’s all they do,” Bender said. They are good at mimicking forms of writing, such as legal contracts, television scripts or sonnets.

“But since they only ever make things up, when the text they have extruded happens to be interpretable as something we deem correct, that is by chance,” Bender said. “Even if they can be tuned to be right more of the time, they will still have failure modes — and likely the failures will be in the cases where it’s harder for a person reading the text to notice, because they are more obscure.”

Those errors are not a huge problem for the marketing firms that have been turning to Jasper AI for help writing pitches, said the company's president, Shane Orlick.

“Hallucinations are actually an added bonus," Orlick said. "We have customers all the time that tell us how it came up with ideas — how Jasper created takes on stories or angles that they would have never thought of themselves.”

The Texas-based startup works with partners like OpenAI, Anthropic, Google or Facebook parent Meta to offer its customers a smorgasbord of AI language models tailored to their needs. For someone concerned about accuracy, it might offer up Anthropic's model, while someone concerned with the security of their proprietary source data might get a different model, Orlick said.

Orlick said he knows hallucinations won't be easily fixed. He's counting on companies like Google, which he says must have a “really high standard of factual content” for its search engine, to put a lot of energy and resources into solutions.

“I think they have to fix this problem," Orlick said. "They’ve got to address this. So I don’t know if it’s ever going to be perfect, but it’ll probably just continue to get better and better over time.”

Techno-optimists, including Microsoft co-founder Bill Gates, have been forecasting a rosy outlook.

“I’m optimistic that, over time, AI models can be taught to distinguish fact from fiction,” Gates said in a July blog post detailing his thoughts on AI’s societal risks.

He cited a 2022 paper from OpenAI as an example of "promising work on this front.”

But even Altman, at least for now, doesn't count on the models to be truthful.

“I probably trust the answers that come out of ChatGPT the least of anybody on Earth,” Altman told the crowd at Bagler's university, to laughter.

A Cash and Debit Card all in one!

Open an account online today, No annual fee required!

Learn More

Your Dream Job in Shinjuku

Study Abroad agency hiring for Administration role- apply now!

Apply Now

9 Comments
Login to comment

dagon
Aug. 3 08:36 am JST

Described as hallucination, confabulation or just plain making things up, it's now a problem for every business, organization and high school student trying to get a generative AI system to compose documents and get work done. Some are using it on tasks with the potential for high-stakes consequences, from psychotherapy to researching and writing legal briefs.

It could be that makes the LLM more human, as confabulation is part of the human psyche; especially now with all the fake news and conspiracy theories.

LLM for images also often display images that skew to the dreamlike.

-1 ( +0 / -1 )

Sven Asai
Aug. 3 09:49 am JST

Of course it’s all a fake and only a big hyped bubble to secure some economy or own job stakes. In case of language models like LLM they now come up partly with the truth by calling it hallucination problems and in case of normal numerical data like in time series predictions they tell you the wonderful tale of too much randomness in the data called white noise. Let’s face it, that whole thing doesn’t work as expected or promised at a big scale. Anyway, a few useful cases are there, let’s say in the fields of audio and visual recognition, feature generation in unsupervised learning and recombination tasks like molecule folding research, but not so very much more. A little bit more honesty would be nice from the responsible people behind that over dimensioned hype, even we all know that their highly paid jobs depend on keeping that virtual bubble growing.

1 ( +1 / -0 )

GBR48
Aug. 3 10:07 am JST

quote: “This isn’t fixable,” said Emily Bender.

Agreed.

You need to restrict the frame of reference so strictly to get accuracy, it slips below the point at which the system could be called general purpose.

1 ( +2 / -1 )

opheliajadefeldt
Aug. 3 11:19 am JST

Any computer based technology is only as good as those who make it or program it, and guess who makes and programs it........humans.

3 ( +3 / -0 )

Wandora
Aug. 3 11:24 am JST

Wait, so Skynet isn’t about to become self-aware? That must be a relief for the gullible souls that believe all the scaremongering. But rather than accept this is another Y2K media frenzy, let’s phrase it as Skynet sabotaging itself. That way the fear can linger.

0 ( +1 / -1 )

ian
Aug. 3 11:34 am JST

Expected, since it learns from and uses data that include hallucinations

2 ( +2 / -0 )

GBR48
Aug. 4 02:18 am JST

quote: Any computer based technology is only as good as those who make it or program it, and guess who makes and programs it........humans.

Usually yes. What is different about this tech is its reliance on human generated content. No matter how well designed it is, no matter how well programmed it is, its reliance on a mish-mash of content, a good deal of which is obsolete, inapplicable, inaccurate or contradictory, torpedoes it below the waterline. There is simply not enough reliable, relevant content out there for what they are trying to do.

Many of the concepts are valid and could be used in specific applications, but the software they are currently offering is inherently flawed.

0 ( +0 / -0 )

theFu
Aug. 5 10:41 pm JST

The fix isn't THAT hard. Use reputable sources full of facts, not rumors or imaginary stories. Before any AI output is provided, run each statement through a fact-checking validation. Computers are good at that. Then pass on the correlation factor for each statement so a user can decide. Statements that have a 90% correlation with a respected, validated, source are likely correct. Then there is every thing else.

0 ( +0 / -0 )

justsomeguy8008
Aug. 11 02:30 pm JST

Well there are other AIs better suited for giving answers such as Perplexity AI. I used that to find things out since all it does is search the web for the answers then search through the sites and then gives the answers it doesn't rely on its on data base. It also gives the links to the sources so you can check the website yourself for further details. I would rather use chatgpt to write up an email in another language. Like i use it to draft up business level emails in Japanese. Of course i proof read them and tweak them a bit but the grunt work is all done by GPT. The other day i sent an email to an potential employer and when i went for the interview the lady was telling me how beautiful my written Japanese was lol.

0 ( +0 / -0 )