Glossary

Large Language Models (LLMs)

LLMs: Mastering Reasonable Continuation

What is An LLM?

LLMs are a deep learning models designed to predict text sequences through an applied technology called Transformers, first discovered by researchers at Google. LLMs fall under the umbrella of Generative AI, an exciting new field. . They are the most common models within the Generative AI space. In fact, many people use the terms LLM and Generative AI interchangeably. Most people interact with LLMs through chatbot websites such as ChatGPT, or Claude. The key thing to note with LLMs is the focus on text and language through their training on text, versus the broader umbrella of Generative AI tools that can work with different mediums such as audio and video.  

Why Do LLMs Work?

As we highlighted in our Gen AI post, LLMs use a deep learning approach called Transformers to build an understanding of the meaning of text, creating uncannily accurate "next token" predictions to string together sentences. Transformers are the backbone of LLMs and work through a process known as a "self-attention mechanism," turning patterns within text into mathematical vectors. By doing so, the deep learning algorithm can compare the relationship in patterns across the text, allowing it to probabilistically predict ensuing words (or fragments of words known as tokens), which are built into phrases, sentences, and ultimately "well-reasoned" content. This is the "chatbot" functionality we've all come to enjoy.

Do LLMs Actually Reason?

This is a tough question. Stephen Wolfram has done a good job of providing a sensible answer. He describes the process by which LLMs spit out answers to prompt questions as a form of "reasonable continuation." As we highlighted above, the Transformer approach is a creative solution towards understand relevance and relationship across patterns within text. These models are trained on billions of pages of text. Technically speaking, ChatGPT 4o has been trained on somewhere close to 500 billion tokens (the actual number is not disclosed).

Having ingested this tremendous corpus of information, LLMs leverage transformers to understand not only patterns across the training data but also some semblance of relative importance and relevance within the patterns across fragments of words. An LLM is able to make a reasonable prediction as to what the next word (really, token) in its sentence response should be. Or as Wolfram puts it, "imagine scanning billions of pages of human-written text (say on the web and in digitized books) and finding all instances of this text—then seeing what word comes next what fraction of the time. ChatGPT effectively does something like this, except that (as I’ll explain) it doesn’t look at literal text; it looks for things that in a certain sense 'match in meaning'."

The problem with "reasonable continuation" could perhaps be described as path dependency. Once you start down one token, the path is effectively fixed. This can lead to "hallucination" and "false confidence" in response. Now, while writing a paragraph summary, that can be fine, as it is a choice of different words with similarity in its contextualization of that meaning. But when retrieving information as in a fact or query, it may lead to the wrong output with little control to check that output. This is particularly visible when querying LLMs for factual output or when the task at hand is multi-stepped reasoning.

Recently (September 2024), OpenAI has launched a new LLM called GPT 4o1, which is designed to handle multi-stepped reasoning. As noted in their release, "We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes." This aims to "break" the path dependency problems of "reasonable continuation" when dealing with trickier outputs that require multiple steps (and thus multiple opportunities to get off track). It remains to be seen how effective this approach will be. For more of the technical chops behind 4o1, see here

Internal Thought Leadership on Generative AI / LLMs: 

AI Governance: The Challenge of Language Models (September, 2024)

How new prompting techniques increase LLM accuracy in financial applications (April 2024)

The great LLM debate: product or infrastructure? (July 2024)

Foundational External Papers & Resources: 

Attention is All You Need (Vaswani et al, 2017)

Improving Language Understanding by Generative Pre-Training (Radford et al, 2018)

Emergent Abilities of Large Language Models (Wei et al., 2022)

Let's Veryify Step by Step (Sutskever et al., 2023)

What Is ChatGPT Doing … and Why Does It Work? (Wolfram, 2023)

Defined by others as: 

AWS: Large language models, also known as LLMs, are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

Cloudflare: A large language model (LLM) is a type of artificial intelligence (AI) program that can recognize and generate text, among other tasks.

Elastic: A large language model (LLM) is a deep learning algorithm that can perform a variety of natural language processing (NLP) tasks. Large language models use transformer models and are trained using massive datasets — hence, large. This enables them to recognize, translate, predict, or generate text or other content.