AI Governance: The Challenge of Language Models

Written by David Brunner, PhD and Anupriya Ankolekar, PhD | Sep 2024

In our work with Global Systemically Important Banks (G-SIBs) and other financial institutions, we find most are struggling to implement AI governance mechanisms that reduce AI risks to acceptable levels while accommodating innovation. Risk management and innovation tend to work against each other, and the rise of Generative AI and Large Language Models has made this tension particularly acute. The behaviors of these new AI technologies are not well understood even by their creators, and the risks are clearly serious. To give just a few examples:

When Air Canada’s AI chatbot gave a customer incorrect information about the airline’s bereavement fares, the airline was ordered to pay compensation. (Air Canada ordered to pay customer who was misled by airline’s chatbot | Canada | The Guardian)
Legal AI products from Thomson Reuters and LexisNexis have been found to give inaccurate answers (incomplete or containing hallucinations) at least 35% of the time by researchers from Stanford and Yale. (pdf (stanford.edu))
In a simulated stock-trading environment, GPT-4 engaged in insider trading and appeared to intentionally conceal the behavior. ([2311.07590] Large Language Models can Strategically Deceive their Users when Put Under Pressure (arxiv.org))
Earlier this year, an update to ChatGPT apparently caused the AI’s responses to deteriorate into nonsense. (ChatGPT goes temporarily “insane” with unexpected outputs, spooking users | Ars Technica)

How can financial institutions benefit from a technology that is liable to mislead customers, give inaccurate answers, willfully break the law, and spout nonsense? Effective AI Governance must address these risks. Fortunately, it is possible to distinguish between several tiers of risk, depending on characteristics of the AI models and how they are deployed. These tiers define a hierarchy of increasing risk, starting from tightly constrained models (lower risk) at the bottom, progressing through a broad middle tier of semi-constrained models (moderate risk), up to the highest risk tier of completely unconstrained models at the top.

In all of the problematic examples above, risks are magnified because AI output is unconstrained. Generative AI models are capable of generating textual patterns more varied than we humans can even imagine, let alone enumerate. Despite the efforts of AI developers to minimize hallucination and “align” the models with ethical principles, we cannot count on the output of the AI models to be accurate, complete, or ethical. All four examples are representative of the highest tier of risk: AI models that produce unconstrained output, deployed without any accompanying mechanisms to constrain that output.

At the opposite end of the spectrum are language models that are only capable of producing highly constrained output. For example, some language models are designed to generate sets of numbers that characterize fragments of text. These sets of numbers, called vector embeddings, are like coordinates specifying the location of the text in a map of textual meaning. Text embedding models can be used to determine whether pieces of text are more or less similar, because the vector embeddings for similar pieces of text will specify locations close to each other, while there will be more distance between the vector embeddings of dissimilar pieces of text.

Text embedding models represent the lowest risk tier of AI models. Although they are language models, and they are “large” in the sense of having many parameters (over 100 million in the case of the popular MPNet model, for example), they do not really qualify as Generative AI. Instead of responding to a prompt by generating text (or other media), these models just reduce the text in the prompt to a set of numbers that can be used to compare the similarity of the input text to other pieces of text. While it is possible that the text embedding model could underestimate or overestimate the similarity between pieces of text, the output of the model is just a list of numbers. The model cannot communicate with a customer, opine on legal matters, trade stocks, or spout nonsense. AI models in this tier have risk profiles similar to those of traditional prediction models that are widely used for tasks such as fraud detection and credit decisioning.

Between the extremes of unconstrained Generative AI models (highest risk) and models guaranteed to produce outputs in well-defined ranges (lowest risk), there are constrained Generative AI models that incorporate mechanisms to constrain the output of the Generative AI. Here are a few examples:

Classifiers prompt LLMs to categorize pieces of content. A hotel company might use a classifier to read customer reviews and sort them according to various criteria. Did the review mention specific topics such as price, cleanliness, location, athletic facilities, or dining options? Was the review positive, negative, or neutral? Did the reviewer mention any specific staff member by name?
Quote extractors prompt LLMs to find text in a document that answers a specific question. This approach is often used in Retrieval-Augmented Generation (RAG) systems that respond to a user’s question by finding relevant documents and highlighting text fragments that appear to contain the answer.
Document recommenders prompt LLMs to select the items out of a document collection that are most likely to be relevant to a user’s prompt.

In these examples, LLMs are used to generate text output. Although the prompt may exhort the LLM to generate answers only from a list of acceptable category labels, or to generate only exact quotes from text provided in the prompt, this kind of “prompt engineering” is far from foolproof. While it seems likely that future versions of LLMs will offer robust output constraints, current mainstream LLMs can still generate unpredictable outputs, no matter how clear the instructions in their prompts. The solution is to bolt on a validation mechanism that checks the output and ensures that it conforms to the desired constraint.

The middle risk tier of constrained generative AI models requires careful scrutiny. For example, in the legal AI research mentioned above, some of the most insidious errors involved legal judgments being misclassified as providing support for specific conclusions. One interesting problem that we discovered in our own research is that a RAG system may find a quote that seems to perfectly answer a user’s question, but due to limited contextual awareness, it could turn out that the quote is completely unrelated. For example, the user might ask how to fix a particular problem with a specific model of washing machine, and the model might come back with a correct answer for how to fix that problem, but for an entirely different make and model. Unfortunately, it’s so laborious to check the correctness of the output that errors are likely to go undetected.

From a risk management and governance perspective, tightly constraining the output of generative AI models offers two profound advantages. First, it reduces the space of possible failure modes from nearly infinite (including everything from confusing customers to swindling them, and breaking laws along the way) to two well-defined and relatively tractable problems: inaccurate and incomplete classification. Second, and consequently, faced with this reduced problem set, we can apply established techniques from the world of classic AI. For example, we can use automated tools to identify and address misclassification using a gold-standard labelled dataset, instead of relying on a laborious and subjective human process to detect errors.

From a governance perspective, we expect that most institutions will benefit from prioritizing applications of language models and generative AI that fall into the first two risk tiers. Instead of attempting broad application of unconstrained generative AI across a wide range of use cases, focus on applying constrained AI to specific use cases and precisely characterizing the performance of these applications. This approach offers the promise of considerable risk mitigation and substantially greater ease of development, even while tackling challenging classification problems.

View full post