The Dark Side of LLMs
July 26, 2024
On-demand Webinar: The Future of Artificial Intelligence
September 11, 2024

Why do LLMs hallucinate and is there a way out?

August 23, 2024

Wand-AI-blog-hallucinations

Come this November, it’ll be two years since the launch of ChatGPT, igniting an unprecedented surge of interest in generative AI. As a subset of artificial intelligence (AI), generative AI can create new content—from text and images to music and videos—by learning patterns from vast datasets. Large Language Models (LLMs) like GPT-4, Gemini, Claude, or LLaMa are powerful examples of this technology.

Enterprises across various industries have tried to leverage these artificial intelligence (AI) to enhance their operations and drive innovation through different use cases. For instance, customer service departments use AI chatbots to provide instant responses, significantly improving customer satisfaction. In marketing, generative AI enables the creation of personalized content at scale, allowing companies to engage more effectively with their audiences through use cases like dynamic email campaigns, tailored social media ads, and optimized product recommendations.

However, these models are not without flaws. A notable issue is hallucinations, where the large language model generates information that is plausible-sounding but incorrect or nonsensical. These hallucinations, coupled with inconsistent results and lack of self-criticism, have led to skepticism about the reliability of AI for dependable business outputs.

To shed light on this important topic, we are joined by Dr. Mehdi Fatemi, Senior Researcher and Team Lead at Wand Research, Wand AI’s fundamental research group. Dr. Fatemi will help us understand what hallucinations in LLMs are, why they happen, their effects on various industries, and whether there are solutions to mitigate these challenges while harnessing the full potential of generative AI for businesses.

Sophia: Thank you for joining us today, Dr. Fatemi. Before we dive into the specifics, let’s start with the basics. What exactly is “hallucination” in the context of LLMs?

Dr. Fatemi: In simple terms, hallucination in LLMs does not have a mathematically precise definition, but it generally refers to two scenarios: when the generated content is either incorrect or when it fabricates information that doesn’t exist in the real world. It’s like when someone makes up a story and presents it as fact. In the AI world, it means the model might produce fictitious historical events, imaginary people, false scientific theories, or non-existent books and articles.

Sophia: How do hallucinations in large language models present opportunities and challenges across different industries?

Dr. Fatemi: Hallucinations in large language models present a mixed bag of opportunities and challenges. In fields like marketing, advertising, and entertainment, these hallucinations can be highly effective. LLMs excel at generating engaging content across various media formats, including text, audio, images, and video. For example, in marketing, an LLM might create inventive slogans or compelling narratives that captivate audiences. In entertainment, it can produce imaginative stories or scripts that push the boundaries of creativity.

However, in industries that require high levels of factual accuracy, such as legal, finance, research, and healthcare, the consequences of hallucinations can be severe. In these fields, misinformation can lead to significant legal liabilities, financial losses, and even risks to human health. It is crucial for businesses in these sectors to implement rigorous checks and balances to mitigate the risks associated with LLM-generated content. 

Why do large language models hallucinate?

Sophia: So, why does this happen? What’s causing LLMs to hallucinate?

Dr. Fatemi: One of the main root causes of hallucinations in LLMs is how they handle interpolation and extrapolation. These models operate at the token level, analyzing the given context to predict the next token and complete sequences. Essentially, they are constantly generalizing from the vast amounts of data they’ve been trained on, which involves both interpolation—filling in gaps within the known data to create coherent sequences—and extrapolation—extending beyond the known data to generate new content.

From the model’s perspective, there’s no inherent difference between producing factual information and generating hallucinations. Its primary function is to generate content that’s coherent and contextually relevant, not necessarily accurate. Because the model is designed to create new content, it can sometimes generate information that isn’t grounded in reality, leading to hallucinations.

How do LLM hallucinations affect enterprises?

Sophia: Can you provide a use case of this and explain how it would affect enterprises?

Dr. Fatemi: Imagine a financial institution using an LLM to generate investment reports for its clients. Tasked with analyzing market trends and providing recommendations, the large language model might produce a report with fabricated data about a company’s financial performance. This hallucination could lead to inaccurate investment advice, resulting in significant financial losses and potential legal liabilities for the institution, even though the report’s text may look completely coherent and well-written.

The accuracy of generative AI in financial reporting is crucial because these reports influence lenders’ ability to attract capital and understand clients’ financial health. If the technology produces inaccurate reports, the consequences can be severe. This raises a critical question: Who would be responsible if the model “hallucinates”?

Sophia: How about the data LLMs are trained on? Does it influence their hallucinations?

Dr. Fatemi: Another significant root cause of hallucinations in LLMs is the limitations of their training data. LLMs are trained on vast datasets consisting of text from the internet, books, and other sources. These datasets may lack certain information, contain inaccuracies, biases, or outdated data, which the LLM can learn and replicate in its outputs, but it may insert them incorrectly or in a way that does not align with the context.

The quality of an LLM’s output is directly influenced by the quality of the data it was trained on. If the training data includes errors, biases, or outdated information, the model is likely to reproduce these issues in its generated content. Alternatively, if the training data completely lacks certain information, the LLM will inevitably extrapolate when it comes to that. LLMs do not have the ability to independently verify the accuracy or relevance of the information they generate; they rely entirely on the patterns they have learned from the training data.

Sophia: What are other examples where large language model hallucinations have caused issues for enterprises?

Dr. Fatemi: For instance, large language models (LLMs) have been known to “fill in the gaps” by generating incorrect information when their training data completely lacks certain information.

In the legal field, LLM hallucinations have led to issues such as citing non-existent cases or quoting from cases that do not exist. For example, in the case of Mata v. Avianca, lawyers used ChatGPT to generate legal arguments, only to discover that the model had fabricated citations and references. This incident highlights the risks of using large language model tools without fully understanding their limitations.

Sophia: How can enterprises confidently use AI solutions, understanding the challenges of hallucinations?

Dr. Fatemi: LLMs excel in generating creative content and are well-suited for that purpose. However, for tasks that require accuracy, factual consistency, and reasoning, it’s crucial to integrate other systems and technologies to “steer” the LLM. This approach allows for a more holistic functioning that resembles human cognition, rather than relying solely on language generation.

In the human brain, language processing is just one function that interacts with other cognitive abilities—such as logic, reasoning, accuracy, self-criticism, and personal identity—to ensure a balanced and accurate understanding of the world. Similarly, to mitigate hallucinations in large language models (LLMs) and create a more robust AI solution, it’s essential to incorporate additional modules and develop new technologies that can handle these aspects effectively.

Is there a solution for addressing LLM hallucinations for enterprises?

Sophia: How is your research at Wand Research addressing LLM challenges such as hallucination? 

Dr. Fatemi: The expectation that future LLMs will never hallucinate is unrealistic. Drawing inspiration from the human brain, we find it is not merely a giant input-output system. Instead, it comprises multiple parts, with only one dedicated to language processing. Critical cognitive functions like logic and reasoning are supported by other brain regions, collectively helping to prevent the communication of incorrect information with confidence (what we refer to as hallucination). Notably, humans seem to have an internal critic that helps align our thoughts with our words as we speak.

However, the current approach to reducing hallucinations in LLMs mainly involves fine-tuning or prompt engineering—methods that can easily fail because they still depend on the LLM’s generative capabilities to produce accurate content. In our research, we are exploring how adding a “critic” module, similar to the brain’s internal critic, can address this issue. Additionally, we are investigating how knowledge sharing and collaboration among smaller, specialized models can help prevent hallucinations and other errors. This knowledge sharing and collaboration may occur at various levels, not necessarily just as inputs to the models’ prompts. Developing mathematical frameworks and computational machinery to enable such cognitive capabilities is at the core of our research at Wand Research.

 

Summary

As enterprises increasingly embrace generative AI, it’s crucial to understand the challenges, particularly the issue of hallucinations in LLMs. Think of this as entering a new market—while the opportunities promise growth and innovation, there are hidden pitfalls that can derail your efforts. These “pitfalls,” or hallucinations, arise from several root causes, including interpolation, extrapolation, and limitations in training data.

By understanding these challenges, enterprises can make more informed decisions when selecting an AI partner with the latest and greatest technology to navigate these pitfalls, allowing them to fully leverage the capabilities of generative AI.

However, hallucinations are just one part of the issues with LLMs. In our next article, we will explore another critical issue: the lack of reasoning and planning capabilities in LLMs. Unlike a seasoned business strategist who can break down complex projects into manageable tasks and create logical connections, LLMs struggle with these abilities, limiting their capacity to deliver robust AI solutions.

At Wand AI, we are committed to developing new technologies that address these fundamental challenges. Our advanced generative AI solutions empower enterprises to navigate the AI landscape with confidence. Stay tuned for more insights!

Share