We are now over a year into the “picks and shovels” phase of the AI gold rush, and enterprises are starting to see through the initial hype. Even though generative AI is no longer seen as the magical cure-all, enterprises are recognizing it as a powerful technology with immense untapped potential. The stakes are high: from developing multi-component AI agents that mimic human intelligence to fundamentally changing the process of scientific discovery, generative AI could prove to be the solution for many of our modern-day challenges.
Is current AI technology ready for the enterprise?
As AI continues to lead the charge in task optimization within the workplace, companies of all sizes are scrambling to integrate it into their daily operations. However, beneath this wave of euphoria lies a stark reality. Today’s generative AI tools are built on large language models (LLMs) like GPT-4, PaLM, Claude, or LLaMa. Significant structural issues in LLMs can sometimes cause more harm than help. A recent Ars Technica article, for example, revealed that NYC’s government chatbot is providing incorrect information about city laws and regulations. And this is, by no means, an isolated incident. LLMs are known to hallucinate and provide inaccurate output – albeit with full confidence.
In this article, we highlight some of the core problems of LLMs, their business impact, and the underlying causes of these problems.
LLM Problem #1: Hallucination and Absence of Self-Criticism
One of the most significant issues with LLMs is hallucination, where the model generates information that is factually incorrect or even nonsensical. While LLMs excel in engaging in believable conversations, they often produce convincing but erroneous outputs. This poses significant risks, particularly in high-stakes environments like legal advice, finance, and medical diagnosis.
A recent study by Stanford Human-Centered Artificial Intelligence demonstrated that legal hallucinations are pervasive, with hallucination rates ranging from 69% to 88% for state-of-the-art language models.
Example
Let’s understand this with an example. In a document, the most likely text following the phrase “According to Rule” would be a rule number, such as “12(b)(6)” or “14(c)(4).” While this might appear reasonable, the referenced rule could be nonexistent or irrelevant to the specific topic.
The legal industry has been an early adopter of LLMs for tasks such as document review, drafting, and legal research. However, the risk of hallucination continues to be a critical concern at law firms. An LLM-generated legal brief, for example, could contain fabricated judicial decisions and citations that can have serious legal implications and undermine the credibility of the legal practice.
Root Cause
The root of the problem lies in the simplistic process of token generation. The process involves generating the next word based purely on probabilistic reasoning, without any consideration for the factual accuracy of the information. Unfortunately, LLMs don’t have any built-in mechanism to self-critique and fight hallucination. While there are some techniques that can mitigate hallucination to some extent, the core problem is not fixable with the current structure of LLMs.
Hence, there is a need for a robust solution to the hallucination problem to ensure accurate and reliable AI-generated content. Without such a solution, the potential for harm can outweigh the benefits.
LLM Problem #2: Lack of Reasoning and Planning
Another significant limitation of LLMs is their inability to perform dynamic reasoning and subtasking. As humans, we naturally decompose complex tasks into a series of subtasks, dynamically creating logical connections from premises to conclusions. This process allows us to solve intricate problems efficiently. LLMs, however, lack the mechanism to run a similar process – thereby limiting their effectiveness in problem-solving.
Example
For example, consider the typical workflow of an investment portfolio manager. The manager starts by gathering information about a client’s goals, risk tolerance, and their financial situation. They then analyze market conditions and evaluate asset classes to construct a diversified portfolio for the client. Over time, they implement risk management strategies, monitor performance, draw conclusions, and make adjustments. This workflow involves dynamic decision-making, continuous learning, and adapting to new information—all hallmarks of human reasoning and planning.
Even the best LLMs are unable to handle the multi-step decision-making and complex reasoning required for comprehensive portfolio management. While they can process data quickly and produce human-like responses, they lack the deeper reasoning required for slow, deliberate thought. True intelligence, which can be used to perform tasks independently, requires strong planning and reasoning capabilities.
Root Cause
Yann LeCun, Meta’s top AI scientist and Turing Award winner, points out that LLMs have a constant number of computational steps between input and output. This limits their representational power and ability to reason or plan. He elaborated on this in a recent podcast, where he explains the fundamental characteristics of intelligent behavior that LLMs lack:
“There are several characteristics of intelligent behavior. For example, the capacity to understand the physical world, the ability to remember and retrieve things, persistent memory, the ability to reason, and the ability to plan. Those are four essential characteristics of intelligent systems or entities, humans, and animals. LLMs can do none of those, or they can only do them in a very primitive way. They don’t understand the physical world, don’t have persistent memory, can’t reason, and certainly can’t plan.”
Without replication of the slow, logical, and analytical thinking intrinsic to human reasoning, AI systems cannot deliver the robust value that enterprises expect from this technology.
LLM Problem #3: Compounding Error Effect
Although hallucination and lack of reasoning often get the most attention, another critical limitation of LLMs is the compounding error effect. For more complex tasks that require longer and more detailed responses, LLMs are far more likely to generate incorrect outputs.
Example
Consider a customer support chatbot designed to handle complex queries. Initially, the chatbot may perform well with simple questions, providing accurate and helpful responses. However, as the complexity of the queries increases, the likelihood of generating correct responses decreases substantially – making the chatbot almost completely useless.
Human beings, because of our dynamic subtasking and chain of reasoning abilities, exhibit a slower, more linear divergence. So the more complex the task, the greater the advantage of the human brain over LLMs.
Root Cause
As the length of an LLM’s response increases, the probability of generating correct outputs decreases exponentially. This issue is inherent to the mechanism of LLMs, where each token is created based on the previous ones, leading to an exponential compounding error effect. Since the model cannot go back and correct previous tokens, the probability of the entire sequence being correct diminishes. Consequently, more complex tasks that require longer and more detailed responses are more likely to result in incorrect outputs.
Unless the compounding error effect problem is solved, enterprises can use LLMs only for the simplest tasks that do not require more than a few steps.
Summary
The appeal of generative AI for enterprise applications is real. But unfortunately, there are fundamental problems with LLMs – the underlying models on which today’s generative AI tools are built. Hallucination, absence of self criticism, lack of reasoning and planning, and compounding error effect, can all have serious consequences for at-scale enterprise applications. The future of AI depends on our ability to navigate these complexities and unlock its transformative potential.
Companies like Wand AI are focused on developing fundamentally new technologies that address the core problems that all LLMs have. Our innovative generative AI platform enables enterprises to deploy and scale AI with confidence. Stay tuned for more articles on this topic!