One research area I find fascinating lately is the question of whether hallucination represents a bug or feature in artificial intelligence systems and how it relates to the alignment problem.
***
In the realm of large language models (LLMs), the term "hallucination" has emerged as a point of contention. Andrej Karpathy, a prominent figure in AI, defines LLM hallucinations as "the model saying things that are not grounded in reality, especially when they are presented as facts" (Hacker News, 2023). This phenomenon, often characterized as generating text that deviates from the input or factual knowledge, has sparked debate on its nature and implications. While some view it as a critical flaw, others argue that hallucinations are an inherent feature of LLMs, offering unique advantages alongside challenges.
Examining the Parallels: Confabulation in Humans and LLMs
While the term "hallucination" is often used to describe the generation of false information in LLMs, it may not fully capture the nuanced processes at play. Drawing parallels with the phenomenon of confabulation in humans, as explored in Armin Schnider's research (2003), can offer a more insightful perspective on how LLMs deviate from reality and the potential implications for AI development.
Confabulation: A Human Analogy
Schnider defines confabulation as "the production of fabricated, distorted, or misinterpreted memories about oneself or the world, without the conscious intention to deceive" (2003). This phenomenon, often observed in patients with brain damage, particularly affecting the frontal lobes, shares key similarities with LLM outputs:
Memory Distortion and Reality Confusion: Both confabulating patients and LLMs can exhibit a distorted sense of reality, mixing elements of true events with fabricated information. This can manifest as generating text that is inconsistent with the provided context or factual knowledge.
Temporal Context Confusion: Schnider's research highlights the difficulty confabulating patients have in distinguishing between past and present experiences. Similarly, LLMs may struggle with temporal context, leading to the generation of text that is chronologically inaccurate or inconsistent with the intended timeline.
Underlying Mechanisms: Schnider proposes that confabulation arises from a failure to suppress irrelevant memories, leading to their intrusion into current thought and behavior. This aligns with the understanding of LLMs as statistical prediction machines, where the model may prioritize statistically likely word sequences over factual accuracy, resulting in confabulated outputs.
Contrasting Human and LLM Confabulation
While there are intriguing parallels, it's essential to recognize the key differences between human confabulation and the behavior of LLMs:
Intentionality: Human confabulation is generally unintentional, driven by neurological impairments. In contrast, LLMs lack conscious intention and generate outputs based on their training data and algorithms.
Underlying Causes: Human confabulation is often associated with specific brain regions and neurological conditions. LLM confabulation, on the other hand, arises from the limitations of current AI models and their reliance on statistical prediction.
Adaptability and Learning: Humans, even those experiencing confabulation, have the capacity to learn and adapt their behavior based on feedback and experience. LLMs, while capable of learning through additional training data, may still struggle to overcome their inherent biases and limitations.
Understanding the Origins of Hallucinations
To effectively harness the potential of hallucinations while minimizing their drawbacks, it's crucial to understand their origins:
Limited Contextual Understanding: LLMs may struggle to retain and process the entirety of a given prompt, leading to a loss of crucial context and subsequent errors in generated text (Karpathy, 2023).
Training Data Discrepancies: The vast datasets used to train LLMs can contain inconsistencies and biases, leading to situations where the model generates text that diverges from the intended reference or factual knowledge (DeepChecks Glossary Team, n.d.).
Statistical Prediction vs. Factual Grounding: The inherent nature of LLMs, as described by LeCun, leads them to prioritize statistically probable word sequences over strict adherence to factual accuracy. This can result in plausible-sounding but ultimately incorrect statements.
The Autoregressive Nature of Transformers and Hallucinations
Yann LeCun, another leading AI researcher, attributes hallucinations to the autoregressive nature of transformer models, the dominant architecture underlying LLMs. These models generate text sequentially, predicting the next word based on the preceding sequence. This inherently probabilistic process makes them susceptible to errors, especially when encountering ambiguous or incomplete information. As LeCun argues, the model's attempt to complete the sequence with statistically likely words can lead to fabrications that stray from reality (LessWrong Community, n.d.).
Hallucinations as a Bridge to Artificial General Intelligence

Before we continue, I really want to appreciate these comments under Kaparthy’s tweet on hallucination as a feature not a bug very much as they provided deep insights on the argument!
Hallucinations: Human vs. Machine
The perspective of viewing hallucinations as a feature rather than a bug is further reinforced by the insights of Balázs Kégl, who eloquently tweeted:
"In a world where the future is uncertain, you need to hallucinate all the time. This is how perception works, you hallucinate future positions of cars around in your mind, and act accordingly, then you compare your hallucination with what comes through your visual system..." (Kégl, 2023).
Kégl's observation beautifully captures the essence of why hallucination is not just a quirk of LLMs, but a fundamental aspect of intelligent systems operating in uncertain environments. Humans, for example, constantly engage in predictive processing, essentially "hallucinating" potential future scenarios based on incomplete information to guide their actions. When driving, we predict the trajectory of other cars to avoid collisions; when planning, we envision potential outcomes to make informed decisions.
LLMs exhibit a similar ability to fill in the gaps and generate plausible continuations based on the information they have been trained on. This ability, while sometimes leading to factual errors, allows them to explore a wider range of possibilities and generate creative solutions that might not be immediately apparent. Just like humans, LLMs benefit from their "hallucinatory" capabilities to navigate the uncertainties of the real world and explore uncharted territories of knowledge and creativity.
The Drive for Superintelligence and the Sisyphean Pursuit
The desire for a "super-intelligent AI that doesn't do what we constantly do" (Anonymous, 2023) reflects a deep-seated human aspiration to transcend our limitations and create something better than ourselves. However, this pursuit may be a Sisyphean task, as LLMs, like any technology, will inevitably reflect the biases and imperfections of their creators.
"We crave a super-intelligent AI that doesn’t do what we constantly do. It’s human nature, trying to create technology that surpasses our own faults. I don’t think this ever change if we understand technically how LLMs work! Even we may get disappointed when we know it. We revel in fiction, in Marvel’s artistic hallucinations, but when LLMs spin a yarn, we call it a bug. Isn’t that hypocritical? We’ve always aimed to build a better, error-free world through technology, yet we forget these systems mirror us – imagination, dreams, and all. It’s a classic human move, seeking perfection in machines that we never had. Maybe it’s a good thing, pushing us forward, or maybe it’s our Sisyphus punishment!" (Anonymous, 2023)
This introspective analysis suggests that our quest for perfect AI might be a reflection of our own flaws and limitations. Perhaps by accepting the inevitability of imperfection in both humans and machines, we can shift our focus from a futile pursuit of flawlessness to a more collaborative approach, where AI systems complement and augment human capabilities, with all their inherent strengths and weaknesses.
Hallucinations as a Bridge to Abstraction and Knowledge Integration
The ability to "hallucinate" or generate information not explicitly present in the input data allows LLMs to go beyond mere pattern recognition and venture into the realm of abstraction and knowledge integration. As Nick Dobos, a software engineer and AI enthusiast, aptly comments under Kaparthy’s tweet:
"@karpathy Yes. Spot on. Hallucinations are what makes it work! The whole point is that I can abstractly bring information together" (Dobos, 2023)
This ability to bridge seemingly disparate concepts and create novel connections is at the heart of human creativity and intelligence. While LLM hallucinations may not always be factually accurate, they represent a step towards the ability to synthesize information and form abstract understanding, a crucial aspect of AGI.
Implications for AI Development
Examining the parallels between human confabulation and LLM outputs offers valuable insights for the future of AI:
Understanding the Limitations of Current Models: Recognizing the similarities with confabulation highlights the limitations of current LLMs in achieving true understanding and reasoning. This emphasizes the need for further research into developing models that can better grasp the nuances of context, temporality, and factual accuracy.
Developing Robust Mitigation Strategies: By studying the mechanisms underlying confabulation, we can develop more effective techniques to mitigate the generation of false information in LLMs. This includes exploring methods to improve contextual understanding, enhance temporal reasoning, and integrate factual knowledge bases into the model's decision-making process.
Moving Beyond Statistical Prediction: The limitations of current LLMs suggest that achieving true artificial general intelligence requires moving beyond mere statistical prediction. This necessitates exploring new models and architectures that incorporate elements of reasoning, causal understanding, and the ability to learn from mistakes and adapt to changing environments.
By drawing on insights from human cognition and conditions like confabulation, we can gain a deeper understanding of the challenges and opportunities in AI development. This approach can guide us towards creating more reliable, robust, and ultimately more intelligent AI systems that can truly understand and interact with the world around them.
The Future of Hallucinations and LLMs
The debate surrounding hallucinations in LLMs is far from settled. However, the perspective of viewing them as a feature rather than a bug opens exciting possibilities. Future research can focus on refining techniques to control and leverage hallucinations effectively, maximizing their creative potential while ensuring factual accuracy. By understanding the underlying mechanisms and developing robust mitigation strategies, we can unlock the full potential of LLMs as powerful tools for creative exploration, problem-solving, and communication.
I don't think we fully understand the mechanisms behind false inferences in neural networks yet. Disentangling beneficial imagination from harmful hallucinations seems critical for creating safe and useful AI.
Mapping out the overlaps between creativity, imagination, and hallucination in AI systems seems like an important open research direction as we work to build safe and beneficial AI.
It is safe to contend that hallucinations are not merely a bug to be fixed but a feature with potential benefits:
Creativity and Diversity: Hallucinations enable LLMs to generate novel ideas and explore diverse perspectives beyond the confines of the provided information. This can be invaluable in creative writing, brainstorming, and exploring alternative solutions to problems.
Filling in the Gaps: Similar to how humans fill in gaps in their memory, LLMs use statistically probable patterns to complete information and generate a coherent narrative. This ability to bridge missing links can be useful in tasks such as text summarization and dialogue generation.
By embracing this perspective, we can shift our focus from eliminating hallucinations to harnessing their potential as a driving force for innovation and adaptability in the realm of artificial intelligence.
PS: These thoughts fully came to life while watching a lecture by Geoffery Hinton on digital intelligence!
References
Anonymous. (2023, November 28). Re: I always struggle a bit when I’m asked about the “hallucination problem” in LLMs. [Comment on a LinkedIn post]. https://www.linkedin.com/feed/update/urn:li:activity:7187699342932156416?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7187699342932156416%2C7188026149946429440%29&dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287188026149946429440%2Curn%3Ali%3Aactivity%3A7187699342932156416%29
DeepChecks Glossary Team. (n.d.). Large Language Models (LLMs) Hallucinations. Deep Checks Glossary. https://deepchecks.com/glossary/llms-hallucinations/
Dobos, N. [@unclecode]. (2023, December 9). @karpathy Yes. Spot on Hallucinations are what makes it work! The whole point is that I can abstractly bring information together [Tweet]. Twitter. https://x.com/unclecode/status/1733672600893899141
Hacker News. (2023, December 9). Andrej Karpathy on Hallucinations. https://news.ycombinator.com/item?id=38636593
Iguazio. (n.d.). LLM Hallucinations. Iguazio Glossary. https://www.iguazio.com/glossary/llm-hallucination/
Karpathy, A. (2023, December 9). I always struggle a bit with I’m asked about the “hallucination problem” in LLMs. Simon Willison's blog. https://simonwillison.net/2023/Dec/9/andrej-karpathy/
LessWrong Community. (n.d.). What experiment settles the Gary Marcus vs. Geoffrey Hinton debate? Less Wrong Subcommunity. https://www.lesswrong.com/posts/9PfBYpAE6cLXfCLM9/what-experiment-settles-the-gary-marcus-vs-geoffrey-hinton-debate/
Marcus, G. [@GaryMarcus]. (2023, December 9). When’s the last time you hallucinated another car while you were driving? A person who wasn’t in your room? [Tweet]. Twitter. https://x.com/GaryMarcus/status/1733299213503787018
Martin, R. C. [@unclebobmartin]. (2023, December 9). All creative work is iterative. We start with an imperfect idea, implement it imperfectly, and then improve on it over time. [Tweet]. Twitter. https://x.com/unclebobmartin/status/1733672600893899141
Schnider, A. (2003). Spontaneous confabulation and the adaptation of thought to ongoing reality. Nature Reviews Neuroscience, 4(8), 662–671. https://doi.org/10.1038/nrn1179
i have a few more thoughts on the subject, let me school myself and put it into a lesswrong
i think an additional thing to note is that we call them hallucinations because of the propensity of these ideas to mislead their holder(or in the case of AI, reader). this means that we need to come up with new names for hallucinations if they are a feature