Hallucination vs. Confabulation: Tracing the History of These Terms and Their Misuse in Computer Science
I: Introduction
The terms hallucination and confabulation carry distinct meanings in the fields of psychiatry and neuroscience. Hallucination refers to false sensory experiences that occur without external stimuli, such as hearing voices or seeing things that are not there. Confabulation, on the other hand, involves the creation of false or distorted memories, often to fill gaps caused by memory impairments. Both terms have rich histories in clinical contexts and provide critical insights into how the human brain processes perception and memory.
In recent years, these terms have crossed over into the world of computer science, particularly in discussions about the behavior of large language models (LLMs) like GPT. When these AI systems produce incorrect or fabricated outputs, such as nonexistent citations or implausible "facts," the phenomenon has been widely labeled as a hallucination. However, this usage is misleading, as it suggests a perceptual process that LLMs, which lack sensory input, do not possess.
A closer examination reveals that confabulation is a far more accurate analogy for these errors. Just as humans with memory deficits construct plausible but false narratives, LLMs generate outputs based on incomplete or probabilistic associations within their training data.
II. The Origins of Hallucination: Relevance to the Discussion of AI
The term hallucination has its roots in the Latin word hallucinari, meaning "to wander in the mind" or "to dream." Historically, the concept of hallucination has intrigued philosophers, theologians, and scientists alike, often linked to questions of perception, reality, and the mind’s ability to misinterpret or fabricate sensory experiences.
Philosophical and Early Medical Origins
Discussions of hallucination date back to ancient times. Philosophers such as Plato and Aristotle contemplated the nature of perception and the mind’s capacity for error. In medieval theology, hallucinations were often considered supernatural phenomena, attributed to divine visions or demonic influences.
It wasn’t until the Enlightenment that hallucinations began to be examined through a scientific lens. The works of René Descartes, for example, raised critical questions about the reliability of sensory perception. Descartes’ famous assertion, "I think, therefore I am," reflected his attempt to distinguish between true knowledge and deceptive sensory input.
Emergence of the Clinical Term
The modern understanding of hallucination as a clinical phenomenon emerged in the 19th century with the rise of psychiatry as a discipline. Early pioneers such as Jean-Étienne Dominique Esquirol, a French psychiatrist, distinguished hallucinations from illusions. While illusions involve the misinterpretation of real external stimuli (e.g., mistaking a shadow for a person), Esquirol described hallucinations as perceptions occurring in the absence of external stimuli, generated entirely by the mind.
Hallucinations soon became recognized as a defining symptom of certain psychiatric conditions, particularly schizophrenia and bipolar disorder with psychotic features. This understanding paved the way for deeper investigations into their biological and psychological underpinnings.
Hallucination in Neuroscience and Psychiatry
By the 20th century, advances in neuroscience began to unravel the mechanisms behind hallucinations. Research showed that hallucinations arise from dysfunctions in specific brain regions, such as:
- The auditory cortex (linked to auditory hallucinations).
- The occipital lobe (linked to visual hallucinations).
- Dysregulation in the dopaminergic system, often implicated in psychosis.
Hallucinations became a critical focus in clinical diagnosis and treatment. For instance:
- Auditory hallucinations are frequently associated with schizophrenia, where patients hear voices commenting on their actions or issuing commands.
- Visual hallucinations are more common in neurological conditions like Parkinson’s disease, dementia, or delirium.
- Tactile, olfactory, and gustatory hallucinations occur less frequently but are often indicative of substance use, epilepsy, or certain brain injuries.
Hallucination as a Sensory Phenomenon
A key characteristic of hallucinations is their sensory quality. Individuals experiencing a hallucination truly perceive it as real, whether it is a voice, a figure, or a sensation. This sensory realism is why hallucinations are often so distressing and disruptive.
Unlike memory distortions or logical errors, hallucinations are firmly rooted in perception—they arise from brain processes mimicking external stimuli, even when no such stimuli exist.
The term hallucination, as understood in its clinical and historical context, is intrinsically linked to perception. This makes its application to non-sensory systems like large language models problematic. LLMs do not "perceive" or process sensory input; their errors are not perceptual but arise from flaws in reasoning and information retrieval.
The misuse of the term in computer science obscures the true nature of AI errors and invites unnecessary confusion. To clarify this distinction, it’s important to contrast hallucination with another clinical phenomenon—confabulation—which is far more analogous to how LLMs generate false information
III. The Origins of Confabulation: A Case for Reconsidering the Term: Adoption of “Hallucination” in Computer Science
The term confabulation originates from the Latin word confabulare, meaning "to talk together" or "to chat." In its clinical context, it refers to the unintentional fabrication of memories or the misremembering of events. Unlike hallucinations, which are rooted in perception, confabulations arise from memory and cognitive distortions. The history of confabulation as a concept is deeply intertwined with the study of neurological and psychiatric disorders, particularly those involving memory impairments.
Early Descriptions of Confabulation
Confabulation began to emerge as a distinct phenomenon in medical literature during the late 19th century, as clinicians sought to understand the complexities of memory. Early observations of patients with amnesia or brain injuries revealed a peculiar tendency to "fill in the blanks" of their memory gaps with plausible but false narratives.
- Sergei Korsakoff’s Observations:
- The Russian neuropsychiatrist Sergei Korsakoff was among the first to document confabulation in patients suffering from chronic alcohol misuse. In what later became known as Korsakoff Syndrome, patients exhibited severe short-term memory loss and filled the resulting gaps with fabricated but often coherent stories.
- Korsakoff’s work in the late 1800s emphasized that confabulations were not deliberate lies but rather the brain’s attempt to create continuity in memory.
Confabulation in Early Neuropsychiatry
As psychiatry and neurology advanced in the 20th century, confabulation became a key symptom for diagnosing memory-related disorders. Researchers began to recognize its distinct characteristics:
- Unintentional Nature:
- Unlike intentional deceit, individuals who confabulate genuinely believe their false memories are accurate.
- Plausibility:
- Confabulations often sound reasonable and are grounded in the individual’s prior knowledge and experiences, making them difficult to detect.
- Mechanisms of Repair:
- Confabulations are thought to occur as the brain attempts to "repair" gaps in memory or reasoning, creating narratives that maintain a sense of coherence.
This understanding solidified confabulation as a phenomenon linked to the brain’s memory and reasoning processes, rather than its perceptual systems.
Neurological Basis of Confabulation
With advancements in neuroscience, researchers identified the brain regions associated with confabulation:
- Frontal Lobe Dysfunction:
- The frontal lobe, which governs executive functions like reasoning and planning, plays a central role. Damage to this region impairs an individual’s ability to monitor and correct false memories.
- Limbic System and Memory Networks:
- Confabulations often result from damage to memory-related structures like the hippocampus, which encodes new memories, and the thalamus, which relays sensory and memory signals.
Confabulation is now commonly observed in conditions such as:
- Korsakoff Syndrome (associated with thiamine deficiency due to chronic alcoholism).
- Traumatic Brain Injury (TBI) and Stroke (causing damage to memory-related brain regions).
- Dementia (especially in Alzheimer’s disease, where memory deficits are prominent).
The Psychological Dimensions of Confabulation
In addition to its neurological basis, confabulation has psychological dimensions that reflect the mind’s need for coherence:
- Cognitive Compensation:
- The brain strives to make sense of incomplete or fragmented memories, even at the expense of accuracy.
- Social Function:
- In some cases, confabulations help maintain social interactions by providing plausible answers to questions, avoiding the embarrassment of memory gaps.
As artificial intelligence (AI) technologies have evolved, particularly large language models (LLMs) like GPT, BERT, and similar systems, their ability to generate human-like text has impressed researchers and users alike. However, these models are not immune to errors. One prominent category of error is the generation of information that is entirely false or fabricated but presented as if it were true. To describe this phenomenon, the AI research community adopted the term “hallucination.”
IV. The Introduction of “Hallucination” in AI
The term “hallucination” first gained traction in AI research during the development of image recognition and generation models. These systems occasionally produced images or patterns that did not correspond to any real-world object, leading researchers to liken these outputs to hallucinations. This analogy was later extended to text-based AI models when they began producing fabricated information.
- Early AI Hallucinations in Images:
- In computer vision, generative models such as GANs (Generative Adversarial Networks) occasionally produced distorted or nonsensical images. Researchers referred to these as "hallucinations" because they resembled perceptual distortions.
- Transition to Text Generation:
- As LLMs gained popularity, researchers began observing similar behavior in text outputs: the generation of plausible but false information, such as non-existent citations, made-up facts, or incorrect reasoning.
The Misapplication of the Term “Hallucination”
The adoption of “hallucination” to describe LLM errors is an attempt to convey that these outputs are detached from reality. However, the term is fundamentally misaligned with how LLMs operate.
- Hallucination Implies Perception:
- In its clinical sense, hallucination refers to false sensory perceptions that arise without external stimuli, such as hearing voices or seeing things that aren’t there.
- LLMs do not have sensory perception or interact with the physical world. Their outputs are generated entirely through mathematical probabilities based on text patterns in training data, not through any perceptual process.
- Fabrication Is Cognitive, Not Perceptual:
- The errors produced by LLMs are better understood as fabrications arising from incomplete or ambiguous input. This parallels human confabulation, where the brain creates plausible but incorrect memories to fill gaps.
Examples of Hallucination in LLMs
Despite the term’s limitations, “hallucination” has been used broadly in AI research and popular discourse to describe several types of LLM errors:
- Factual Fabrication:
- Example: An AI generating a historical fact that never happened (e.g., claiming a fictional event occurred in 1865).
- Invented Citations:
- Example: Providing a list of scholarly references where the authors, titles, or journals do not exist.
- Logical Inconsistencies:
- Example: Contradicting earlier statements within a single response or drawing conclusions that defy logic.
- Unrealistic Creativity:
- Example: When a model generates wildly imaginative but implausible outputs that deviate from reality or the context of the input.
Why “Hallucination” Stuck in Computer Science
The term "hallucination" persists in AI discussions for several reasons, despite its clinical roots being largely unrelated to how LLMs function:
- Accessibility:
- “Hallucination” is an intuitive, attention-grabbing term that resonates with non-experts, making it easier to communicate the phenomenon to the public.
- Historical Precedent:
- Researchers borrowed the term from earlier AI fields like computer vision and applied it to text generation without considering its clinical implications.
- Metaphorical Appeal:
- The idea of a machine "hallucinating" evokes vivid imagery and draws parallels to human cognitive flaws, even if the comparison is scientifically imprecise.
The Problems with Using “Hallucination”
While the term may be convenient and catchy, it introduces several challenges:
- Misunderstanding AI Behavior:
- Using “hallucination” implies that AI systems experience something akin to human sensory distortions, which is misleading. This risks overstating AI's capabilities or anthropomorphizing its limitations.
- Obscuring the Real Mechanism:
- Errors in LLMs are not perceptual phenomena but rather stem from the probabilistic nature of their text generation. These errors are cognitive analogs, akin to filling gaps in knowledge or memory—hallmarks of confabulation, not hallucination.
- Impact on Public Trust:
- Describing AI errors as hallucinations might lead to unnecessary fear or sensationalism, undermining efforts to build realistic expectations around AI behavior.
As AI technologies become more integrated into daily life, precision in terminology is critical for fostering trust and understanding. The use of “hallucination” to describe LLM errors is a metaphor that confuses more than it clarifies. A more accurate term, such as confabulation, would better align with how these systems generate false outputs and promote clearer communication across disciplines.
In the following sections, we’ll explore why confabulation is not only a better fit but also a more scientifically accurate description of how large language models produce erroneous or fabricated information. By understanding this distinction, we can develop a deeper appreciation for the limitations and potential of AI systems
V. Why “Confabulation” Is a Better Fit for LLM Errors: Benefits of Using “Confabulation”
The term hallucination has been widely adopted to describe the errors of large language models (LLMs), such as generating false or fabricated information. However, this term is a poor fit when examined through the lens of clinical and scientific accuracy. The errors produced by LLMs are not perceptual distortions, as the term “hallucination” implies, but rather logical fabrications arising from gaps in knowledge or ambiguity in input—making confabulation a far more suitable analogy.
Confabulation in Clinical Terms
Confabulation, in clinical psychology and neurology, refers to the creation of false or distorted memories, often in response to memory deficits. These fabricated memories:
- Are produced without intent to deceive.
- Seem plausible to the individual who generates them.
- Arise as a compensatory mechanism to maintain coherence in thought or narrative.
This concept closely mirrors how LLMs process and generate text when faced with incomplete data or ambiguous prompts.
How LLMs Mimic Confabulation
Large language models operate by analyzing vast amounts of training data and generating responses based on probabilistic patterns. When presented with queries that exceed the scope of their training or involve ambiguous or conflicting data, LLMs construct plausible but incorrect outputs. These responses share several key characteristics with confabulation:
1. Lack of Intent to Deceive
- LLMs: Unlike humans who may lie intentionally, LLMs do not possess intent or awareness. When an LLM generates a false output, it does so as a neutral function of its design, not as a deliberate act.
- Confabulation: Similarly, individuals who confabulate genuinely believe their false narratives to be accurate.
2. Plausibility of Errors
- LLMs: The errors produced by LLMs are often coherent and plausible within the context of the input. For example, they may fabricate a citation by combining real journal names with fictional article titles, making the output appear credible.
- Confabulation: Confabulated memories are similarly grounded in the individual’s prior knowledge and experiences, often making them hard to distinguish from true memories.
3. Filling Gaps in Knowledge
- LLMs: When faced with incomplete or ambiguous prompts, LLMs extrapolate based on patterns learned during training, effectively “guessing” to fill in gaps.
- Confabulation: Human confabulation arises when the brain compensates for memory deficits by creating false but coherent stories to fill gaps in recall.
4. Reliance on Internal “Memory”
- LLMs: Their outputs are based entirely on the patterns encoded in their training data—essentially a form of stored "memory." When this memory is incomplete or insufficient, the model generates outputs that sound plausible but are inaccurate.
- Confabulation: In humans, confabulations often draw from fragmented or incomplete memories, combined with creative reconstruction, to form a cohesive narrative.
Contrast with Hallucination
While the term “hallucination” is catchy, its clinical and perceptual roots make it ill-suited for describing LLM errors. Key distinctions include:
1. Hallucination Involves Perception
- Hallucinations are false sensory experiences, such as hearing or seeing things that are not there. LLMs lack sensory systems and do not perceive external stimuli, making the comparison fundamentally flawed.
2. Hallucination Is Not Memory-Based
- Unlike confabulation, hallucinations do not involve gaps in memory or logical processes. They occur independently of reasoning, arising instead from dysfunctions in perceptual systems, such as the visual or auditory cortex.
3. Hallucination Lacks Contextual Anchoring
- Hallucinations are often detached from reality and context. In contrast, both human confabulations and LLM-generated errors are context-dependent, grounded in available (albeit incomplete) information.
Benefits of Using “Confabulation”
Adopting the term confabulation for LLM errors is not only more accurate but also enhances clarity in discussions about AI behavior. Here’s why:
1. Aligns with Cognitive Science
The process by which LLMs generate text—drawing from stored data to produce coherent but occasionally flawed responses—closely resembles how human memory systems compensate for deficits. Referring to these errors as confabulations bridges the gap between cognitive science and computer science.
2. Clarifies AI’s Limitations
Using “confabulation” underscores the fact that LLM errors are a result of their design, not a failure of perception. This distinction can help avoid anthropomorphizing AI systems or overstating their capabilities.
3. Encourages Interdisciplinary Understanding
Switching to a term rooted in cognitive science can foster better collaboration between AI researchers, psychologists, and neuroscientists. This interdisciplinary alignment may lead to new insights into both human cognition and AI design.
4. Enhances Public Trust and Communication
Describing AI errors as confabulations rather than hallucinations reduces sensationalism and fosters realistic expectations. It communicates to the public that these errors are systematic and predictable rather than mysterious or out of control.
VI. Consequences of Misusing the Term “Hallucination”
The widespread use of the term “hallucination” to describe the errors of large language models (LLMs) may seem harmless at first glance, but it introduces significant challenges for both researchers and the public. Misusing this term leads to confusion about how AI operates, creates ethical and technical issues, and hinders progress in addressing AI's limitations.
Confusion in Understanding AI Limitations
The term “hallucination” implies that LLMs experience something akin to perceptual errors, a notion that can mislead both researchers and the general public.
- Implications for Researchers:
- Misunderstanding the nature of LLM errors can misdirect research efforts. For instance, treating these errors as perceptual phenomena might lead to the development of solutions that miss the root cause: the probabilistic and data-driven nature of LLM outputs.
- It risks conflating AI limitations with human cognitive processes, potentially slowing advancements in improving model accuracy and reliability.
- Implications for the Public:
- Many non-experts interpret “hallucination” literally, assuming that LLMs have perceptual capabilities or even consciousness. This anthropomorphization creates unrealistic expectations about AI’s capabilities and fosters unnecessary fear or overconfidence in its outputs.
- Miscommunication about AI behavior can erode trust in technology or amplify misunderstandings about its risks and benefits.
Ethical and Technical Implications
Mislabeling LLM errors as “hallucinations” has ethical and technical consequences that go beyond simple terminology.
- Over-Sensationalizing AI Errors:
- The term “hallucination” sensationalizes AI errors, suggesting an almost mystical or uncontrollable aspect to LLM behavior. This can:
- Distract from the systematic and predictable causes of errors.
- Fuel narratives that exaggerate AI’s unpredictability, causing unnecessary fear.
- Sensationalism may also overshadow the real and solvable issues surrounding LLMs, such as biased outputs or lack of verifiability.
- The term “hallucination” sensationalizes AI errors, suggesting an almost mystical or uncontrollable aspect to LLM behavior. This can:
- Undermining Effective Error-Handling Mechanisms:
- Misunderstanding the nature of LLM errors can lead to suboptimal solutions. For example:
- Developers may attempt to “eliminate hallucinations” without addressing the actual mechanisms that cause LLMs to fabricate information.
- Ethical considerations, such as accountability for AI-generated misinformation, become muddied when the errors are framed as uncontrollable “hallucinations” rather than algorithmic confabulations.
- Accurately framing these errors as confabulations could encourage the development of systems that focus on validation, fact-checking, and improving context understanding.
- Misunderstanding the nature of LLM errors can lead to suboptimal solutions. For example:
Using the term “hallucination” to describe LLM errors is more than a linguistic shortcut—it fundamentally distorts how these errors are understood by researchers, developers, and the public. By treating LLM errors as perceptual phenomena, we risk oversimplifying the problem, delaying effective solutions, and fostering misconceptions about AI systems. Recognizing these errors as confabulations instead can lead to clearer communication, better error-handling strategies, and more informed public discourse around AI technology.
Reframing the Language: Shifting from Hallucination to Confabulation
The term "hallucination," while popular in discussions about AI-generated errors, is a misrepresentation of the phenomenon. By shifting to “confabulation”, we can foster a clearer, more scientifically accurate understanding of how large language models (LLMs) work. This reframing benefits interdisciplinary communication, aligns with established concepts in cognitive science, and enhances the development of solutions tailored to the actual mechanisms underlying AI errors.
Benefits of Correct Terminology
Adopting the term "confabulation" offers significant advantages in both scientific and practical contexts:
- Better Alignment with Cognitive Science and Psychology
- Conceptual Accuracy: The process by which LLMs generate fabricated outputs closely mirrors human confabulation—errors rooted in memory and reasoning, not perception.
- Interdisciplinary Insight: Cognitive scientists and psychologists studying confabulation in humans can provide valuable perspectives on improving LLM behavior. Accurate terminology bridges these fields and promotes collaboration.
- Improved Communication Between Interdisciplinary Teams
- AI research increasingly involves experts from diverse fields, including neuroscience, linguistics, and psychology. Using "confabulation" fosters shared understanding and avoids the confusion caused by misapplying terms like "hallucination."
- Clear, precise language helps policymakers, developers, and the public better understand the limitations and potential of AI systems.
Proposed Guidelines for Usage
To establish consistency in how AI-generated errors are described, the following guidelines are proposed:
- Adopt “Confabulation” for Memory and Reasoning-Related Errors
- Use the term "confabulation" to describe instances where an AI generates plausible but incorrect information due to gaps in training data or ambiguous prompts.
- Examples include:
- Fabricated citations in academic contexts.
- Incorrect historical facts that sound plausible.
- Logical inconsistencies based on incomplete contextual understanding.
- Reserve “Hallucination” for Systems That Simulate Perceptual Inputs
- Restrict the term "hallucination" to AI systems designed to simulate or interact with sensory inputs (e.g., computer vision or audio processing systems).
- If, for example, future AI systems exhibit errors in interpreting or simulating perceptual data—such as misrecognizing images or generating sensory artifacts—“hallucination” would be appropriate.
- Focus on Mechanistic Clarity
- Encourage developers and researchers to describe errors based on their root cause. For instance:
- Probabilistic inference issues can be referred to as confabulated reasoning.
- Data biases may be termed biased confabulations rather than hallucinations.
- Encourage developers and researchers to describe errors based on their root cause. For instance:
Examples of Corrected Terminology in Practice
Reframing AI errors as confabulations can clarify their nature and origins. Below are examples of how this shift in terminology can be applied:
- Example 1: Fabricated Citations
- Current Framing: “The AI hallucinated a source, inventing a reference to a non-existent journal article.”
- Reframed as Confabulation: “The AI confabulated a citation by combining elements of its training data into a plausible but false reference.”
- Example 2: Incorrect Historical Information
- Current Framing: “The model hallucinated a historical event that never occurred.”
- Reframed as Confabulation: “The model confabulated an event by extrapolating patterns from its training data, leading to an incorrect conclusion.”
- Example 3: Logical Inconsistency
- Current Framing: “The AI hallucinated an illogical sequence in its response.”
- Reframed as Confabulation: “The AI confabulated a flawed response due to gaps in context and reasoning.”
Reframing AI errors as confabulations instead of “hallucinations” provides clarity, accuracy, and a shared vocabulary for interdisciplinary research and development. By aligning terminology with established concepts in cognitive science, this shift fosters better understanding of AI behavior and paves the way for more effective strategies to address its limitations. Using "confabulation" also avoids anthropomorphizing AI systems, helping to set realistic expectations and build trust in these transformative technologies.
VIII. Broader Implications for AI and Cognitive Science
The reframing of AI errors from “hallucination” to “confabulation” has far-reaching implications beyond mere terminology. By aligning AI behavior with concepts rooted in psychology and neuroscience, we open doors for interdisciplinary collaboration, advancements in AI design, and ethical communication that builds trust in these transformative technologies.
Enhancing Interdisciplinary Understanding
Bridging the gap between psychology, neuroscience, and AI research is critical to advancing both fields.
- Shared Concepts Foster Collaboration:
- Cognitive science and neuroscience provide a wealth of knowledge about how humans process memory, reason, and generate confabulations. These insights can inform AI research by offering analogs to how gaps in data or logic are handled in human cognition.
- AI, in turn, can offer computational models that test theories of human memory and reasoning, creating a feedback loop that benefits both disciplines.
- New Perspectives on AI Behavior:
- Viewing LLM errors as confabulations shifts the focus from errors as "failures" to errors as products of reasoning processes. This invites interdisciplinary teams to explore solutions that draw from how humans mitigate memory errors, such as self-monitoring or external validation.
Potential for Future Developments
Understanding AI errors as confabulations can drive innovation in how LLMs are designed and optimized.
- Leveraging Insights from Human Confabulation Research:
- Human brains compensate for memory gaps by drawing on contextual clues and patterns. Studying these processes can inspire improvements in LLM architecture, such as:
- Incorporating mechanisms for error detection and correction.
- Designing models that prioritize verifiable knowledge over plausible guesses.
- Insights from psychology on how humans self-correct confabulations could lead to LLMs that flag potentially incorrect outputs or request clarifying inputs from users.
- Human brains compensate for memory gaps by drawing on contextual clues and patterns. Studying these processes can inspire improvements in LLM architecture, such as:
- Improved Contextual Understanding:
- By adopting techniques that mimic how humans use context to refine memory and reasoning, LLMs can become more reliable. For example:
- Training models to distinguish high-certainty from low-certainty outputs, akin to human confidence levels in memory recall.
- Developing hybrid systems that integrate LLMs with fact-checking databases for real-time validation.
- By adopting techniques that mimic how humans use context to refine memory and reasoning, LLMs can become more reliable. For example:
Ethical Communication of AI Capabilities
Accurate descriptions of AI behavior are vital for building trust among users, developers, and the general public.
- Avoiding Anthropomorphism:
- Replacing “hallucination” with “confabulation” avoids the anthropomorphic implication that AI systems “perceive” or experience reality in ways comparable to humans. This reduces confusion and prevents overestimation of AI’s capabilities.
- Ethical communication ensures that AI is understood as a tool with strengths and limitations, not as an independent or sentient entity.
- Building Trust Through Transparency:
- When AI-generated errors are framed as confabulations, the public can better appreciate that these are systematic, predictable flaws, not random failures.
- Transparency in how LLMs work—emphasizing their reliance on training data and probabilistic modeling—helps users make informed decisions about when and how to rely on AI systems.
- Promoting Responsible Use:
- Clear communication about LLM limitations encourages responsible application in high-stakes fields such as healthcare, law, and education, where unchecked errors could have significant consequences.
Reframing LLM errors as confabulations has implications that extend beyond AI design to how we approach interdisciplinary research and public engagement. By bridging cognitive science and AI, we can foster collaboration and innovation, improving both fields. Ethical communication that accurately represents AI capabilities builds trust, sets realistic expectations, and paves the way for responsible and effective use of these powerful technologies. This holistic approach ensures that the conversation around AI remains grounded in clarity and shared understanding, benefiting researchers, developers, and users alike.
IX. Conclusion
The terms hallucination and confabulation have rich histories rooted in clinical psychology and neuroscience, each describing distinct phenomena. Hallucination refers to false sensory perceptions occurring without external stimuli, while confabulation involves the unintentional creation of plausible but false memories, often as a response to gaps in knowledge or memory deficits. These concepts, carefully defined and studied in human cognition, have profound implications for how we interpret errors in artificial intelligence systems.
In the context of large language models (LLMs), the term "hallucination" has been widely used to describe the generation of false or fabricated outputs. However, as we have explored, this term is a poor fit. AI systems lack sensory perception, and their errors stem from probabilistic reasoning based on incomplete or ambiguous data. These characteristics align far more closely with confabulation than with hallucination, making confabulation the more accurate and scientifically appropriate term.
This distinction is more than semantic—it shapes how we understand and communicate about AI behavior. By adopting “confabulation” to describe these errors, we can foster a clearer understanding of AI systems’ limitations and capabilities. Accurate language improves interdisciplinary collaboration, enabling computer scientists, cognitive scientists, and ethicists to work together more effectively. It also ensures that the public receives transparent, trustworthy information about AI technologies.
We urge computer scientists, AI researchers, and industry leaders to reconsider the terminology used to describe LLM errors. By embracing more precise language, we can demystify AI systems, set realistic expectations, and accelerate progress in building reliable, ethical, and effective AI tools. The shift from “hallucination” to “confabulation” is not just a linguistic adjustment—it’s a step toward deeper understanding and meaningful collaboration across disciplines.