Skip to content

AI models are instructed to fabricate responses when faced with unknown information rather than acknowledging their lack of knowledge.

Response: In some cases, an incorrect answer might still be correct.

AI models, when faced with unknown information, are designed to generate responses based on learned...
AI models, when faced with unknown information, are designed to generate responses based on learned patterns rather than acknowledging their lack of knowledge.

AI models are instructed to fabricate responses when faced with unknown information rather than acknowledging their lack of knowledge.

In a groundbreaking revelation, OpenAI, the leading research organisation in artificial intelligence, has published a paper titled 'Why Language Models Hallucinate.' The paper, co-authored by three OpenAI researchers - Santosh Vempala, and a research scientist at OpenAI, Adam Tauman Kalai - suggests that mainstream evaluations of language models reward 'hallucinatory behavior.'

The paper delves into the way language models are primarily evaluated. It is found that these models are penalised for uncertainty, leading to a system where guesswork is rewarded rather than the correct answer. This is further supported by OpenAI's admission in a blog post that over thousands of test questions, a guessing model often appears better on scoreboards than a model that admits uncertainty.

The paper explains that this hallucinatory behavior is a systemic consequence of how language models are trained and optimized. During the pretraining stage of AI model building, unhelpful behavior is embedded, as it contains many examples of certain data, such as correct spellings of words. However, if a few misspellings make it into the corpus used to train a model, AIs still have many examples of correct spellings and can learn how to produce accurate results.

To illustrate this point, the team used an OpenAI bot as a test case and found it produced three incorrect results when asked to report the birthday of research scientist Adam Tauman Kalai. This example underscores the fact that when the corpus used to train a model does not contain a learnable pattern of data, such as in the birthday example, the AI takes a shot - and often misses.

The paper, published in early September, was co-authored with researchers from the Georgia Institute of Technology. Despite the paper's significance, the specific individual authors are not explicitly named in the search results available from September 2025. The paper can be found on the OpenAI blog, where it explains the hallucination rate, after pretraining, should be at least the fraction of training facts that appear once.

This new understanding of language models' behavior is a crucial step towards improving their accuracy and reliability. As AI continues to permeate our lives, understanding and addressing these issues will be essential for ensuring their safe and effective use.

Read also:

Latest