Chapter 1. AI in the Enterprise
This chapter introduces the intuition behind LLMs, as well as the key concepts from the world of AI and machine learning that are necessary to understand LLMs. Equipped with these concepts, we’ll move on to see the most common uses of LLMs in the enterprise today, before taking a deeper dive into four exemplary projects that encourage a pragmatic understanding of how these language models can be used in industry applications.
AI in Context
As a field that is rapidly evolving and constantly drawing new people into the conversation, many terms in AI (such as “artificial intelligence” itself) aren’t very well defined. Let’s clarify our understanding of them for the purposes of this report.
AI Today Is Mostly Machine Learning
Our everyday use of the word “AI” describes machines that mimic intelligent human behavior. The means by which they do this are not limited to any one technique, and of course intelligence itself is a fuzzy concept. Machine learning (ML), on the other hand, is the study of algorithms that use data to construct models (i.e., predictive representations) of a given domain.
Not all AI is machine learning. But the vast majority of AI technologies making waves today—such as generative AI for images or text—rely on machine learning as their driving force.
An Algorithm Trains a Model with Data
The concepts of algorithms, models, and data are critical in ML. However, their relationship—as well as the nature of an ML model itself—is often poorly understood. It is important to note that a model is not an algorithm, but rather the result of data being processed by an algorithm. An algorithm can be thought of as a plan or set of rules in a programming language that describes this process, which is also known as “training a model on a dataset.”
The algorithm itself only defines the structure of the model, not its content. That’s why running an ML algorithm on two different datasets—say, one in English and one in Spanish—will produce two different models.
A Trained Model Is Defined by Its Parameters
A machine learning model’s size is determined by its parameters: adjustable numeric values that are optimized during training to accurately recognize and predict patterns in data. The values of a trained model’s parameters characterize it and can be shared and reused, especially for inference, where the model applies its learning to new, previously unseen data points.
Large language models can have millions, billions, or even more trainable parameters.
There Are Many Different Kinds of Language Models
Language models have learned a representation of the language on which they’ve been trained. This representation can then be used to make predictions, such as which word will follow a sequence of words.
But not all language models are generative. In particular, smaller, pre-ChatGPT models that are still widely used in many different applications specialize in extracting information from text or embedding text in numerical format for fast semantic search. These more specialized models, such as RoBERTa and SentenceTransformers, continue to exist alongside their generative LLM cousins, whose emergent properties have made them world famous.
So What Exactly Is an LLM?
Now that we’ve gone over these complex concepts related to machine learning, language modeling, and AI, we have the understanding and the terminology to talk about what LLMs are (and aren’t).
In the most common understanding of the term, LLMs are language models with billions of parameters that excel at generating text in natural language. To ensure that they learn to capture an accurate representation of the world, LLMs are trained on vast collections of textual data. In addition, it is becoming increasingly common to find multimodal models that can process and generate both text and images—for example, they can be used to create captions for diagrams. This is a natural consequence of the fact that the technology behind LLMs is also very good at processing visual input. The technology underlying LLMs also excels at processing visual input, so they can be successfully applied to text and images simultaneously, as seen in models like GPT-4 and Gemini. By looking at terabytes of data from the web and other sources, LLMs use their many parameters to learn linguistic patterns, as well as factual information and correlative relationships across a wide range of topics. They can then use these representations to perform complex tasks, like writing valid code in any programming language, generating image captions, and composing prose in the style of a particular author from scratch. These abilities—known as “emergent properties,” because they were not literally present in the training data—arise from the LLM’s ability to apply the patterns it has learned to new domains. For many people, they are the most impressive feature of these language models.
An important concept in the context of LLMs is that of prompts. Prompts are the text that we, as users of an LLM, feed to the model, instructing it to perform certain tasks. As such, prompts are the de facto user interface of an LLM. Effective prompting requires some practice and the use of tested and proven techniques. The length of the prompt is model-specific, but most prompts have enough space to provide context for the model to base its responses on—a critical concept in the most successful enterprise applications of LLMs, as we will soon see.
What makes LLMs so attractive to businesses is their ability to reason and write in humanlike quality and at a scale well beyond that. Text-related tasks such as programming, report analysis, information extraction, and copywriting can now be performed by machines within seconds, opening up a wide range of applications previously unimaginable. Maybe that’s why it can be hard to ideate use cases: we have to change our entire way of thinking about what’s possible. To help us broaden our horizons and start thinking about LLM-powered applications that create value in the real world, let’s look at practical examples of the most common uses.
A Snapshot of Language Models in the Enterprise
In practice, the boundaries between generative LLMs and their smaller counterparts are not always clear. All language models originate from the discipline of natural language processing (NLP), which has long been concerned with how to make natural, human language processable by machines. After early successes such as machine translation, spam detection, and speech generation, NLP finally entered the mainstream with generative AI. But now we’re seeing a trend among industry thought leaders and analysts to recognize the usefulness of earlier language models by including them under the umbrella of LLM. These smaller, nongenerative models have had more time to gain traction and are already firmly and quietly embedded in products across industries—increasingly in combination with generative LLMs. In addition, a new class of small language models (SLMs) has recently emerged that aims to capture the generative and reasoning capabilities of LLMs, albeit with a much smaller number of parameters. Now that we have a rough overview of the language modeling landscape, here’s a list of the four most important use cases for language models in production:
- Generative AI and chat
-
Given the popularity of ChatGPT, it is no surprise that generative AI is the most popular use case for enterprises. In the context of LLMs, generative AI describes the generation of text in response to a prompt. For example, it can provide users of a styling service with customized outfit descriptions on the fly.
- Information extraction
-
Information extraction uses language models to isolate key pieces of information from text documents for use in downstream tasks, such as populating a table or database. Consider a construction company that works with tens of thousands of contractors, all of whom have different insurance policies. To ensure that it is not liable for its contractors’ mistakes, the company wants to understand which risks are covered by which policies. To do this, it automatically extracts this information from the lengthy (potentially 500-page) contracts and stores it as metadata.
- Recommendation and search systems
-
Semantic recommendation systems take advantage of the ability of language models to mathematically represent and compare the meaning of text, providing an extremely fast way to find similar documents, even across large databases. An example of this technique in action is recommending properties with similar descriptions after each listing on a real estate platform.
- Text classification
-
Classification is a “classic” NLP application that can help sort text into predefined categories for faster processing. For example, you may want to classify incoming user queries as positive or negative before passing them to a downstream application or human agent.
Four Sample LLM Applications
The products best suited for LLM adoption have one aspect in common: textual data is a major factor. Here I look at four examples, inspired by real-world LLM applications, that serve to further illustrate how companies can move from problem to LLM solution. You’ll see that they span different industries, pain points, flavors of LLM, and user groups.
Sophisticated Recommendations for Legal Professionals
For lawyers working on a case, it’s critical to review all relevant legal cases and documents. Consider a company that has a large database of legal texts—including past cases and decisions—available to its subscribers, most of whom are legal professionals. The company wants to enhance its service by adding a feature that displays a list of related cases and documents on its platform. This tool is designed to help lawyers find and review relevant cases more quickly and effectively, thereby allowing them to take on more cases or improve the quality of their work by catching details they might otherwise miss. However, because new cases are constantly being added to the database, this list of relevant cases cannot be static, but must be defined dynamically.
The solution is a recommendation system based on semantic similarity. Semantic similarity uses small or large language models to determine the closeness of meaning between two documents based on an abstract representation of their content, rather than the vocabulary they use. This helps to identify two semantically similar documents even if they use very different words. In addition to the search functionality and its integration into a user-facing frontend, the team tasked with building the application also needs to architect and schedule the preprocessing of incoming documents. This process, known as indexing, is accomplished by the same language model.
Conversational AI for Technical Documentation
For developers, clear and accessible software documentation is critical—it is the guide that helps them use software or libraries effectively. But navigating documentation can be daunting. Consider the massive scale of cloud service providers like Amazon Web Services (AWS) or Azure: their documentation can run to thousands of pages, covering a wide range of services, features, and policies. Faced with such a sea of information, users quickly feel overwhelmed and often turn to browsing the web for answers rather than sifting through official documentation. To address this problem, a software company wants to revolutionize the way customers interact with their documentation. They plan to introduce an intuitive search interface that allows users to ask questions about the codebase in natural language and quickly find accurate answers. This not only streamlines the process, but also ensures that developers are getting the most accurate and relevant information directly from the source.
The natural language capabilities of an LLM are ideal for this use case. However, the team doesn’t want the LLM to generate answers based on the knowledge it learned during training: the documentation was probably not part of that training data—and even if it was, that information could become outdated with the next software update. To compound the problem, LLMs are notoriously bad at understanding the limits of their own knowledge. Thus, when asked about things absent from their training data, they often answer regardless—by inventing facts. This is known as “hallucinating,” and it is a major problem in LLM adoption.
The remedy to outdated information in the LLM’s parameters and ensuing inflation of hallucinations is a method known as retrieval-augmented generation (RAG). In such a setup, the LLM is preceded by a retrieval module, which extracts the candidate documents it deems most suitable to answer the user query from your database. Upon receiving a query, the RAG system first identifies suitable documentation pages. It then embeds those in the prompt to the LLM, instructing it to base its answers on the fact-checked information from the database. RAG as an LLM technique is extremely popular because of its ability to create a factual knowledge base on the fly.
Automating the Collection of Information from Earnings Reports
The advent of LLMs has enabled machines to process unstructured data. The term “unstructured” refers to data types such as images, audio, video, or text: formats that don’t follow a strictly predefined structure. Structured data, on the other hand, is data that comes in a predictable format, such as tables and graphs, and can be processed using less resource-intensive methods. For example, a large table can be queried using SQL, which is faster, more accurate, and infinitely cheaper than running an LLM for the same task.
Let’s say a company wants to identify information in its unstructured textual data that can be fed into such tables. For example, they might want to extract specific numerical and other factual data points from a collection of earnings reports.
The solution: using a smaller language model to mine text for information, not by generating answers, but by highlighting it in the underlying source document. Such models are called “extractive” language models. They’re not only lighter and cheaper than LLMs but also safer for highly sensitive areas such as finance. That’s because they are incapable of LLM-like hallucinations, and they are necessarily more faithful to the underlying dataset.
Condensing Political Discourse for News Consumers
Democracy thrives on the active participation of its citizens. But political debates in parliament are often long and difficult to access. For this use case, let’s imagine a government application that wants to make parliamentary debates more accessible to citizens by summarizing the debates’ transcripts according to the user’s interest.
This use case is similar to the second in that we’re dealing with a database of texts that is updated periodically. And indeed, to address it, we would again use a retrieval module that extracts the relevant transcripts upon receiving a user query. But instead of generating answers, this system would use an LLM to summarize all the underlying texts. In this way, users could get timely overviews of political debates tailored to their individual interests.
Pivoting Attention from Technology to Product
New technologies are often the subject of hype at one end of the spectrum and doom at the other. The same is true of large language models, whose complexity makes them difficult to grasp even for technically-minded people. But to build useful and innovative products with AI, you don’t really need to understand the mathematics of LLMs or the details of their implementation. What you do need is a strong sense of how they can be applied to solve real-world problems. This chapter has introduced basic concepts in AI, with a specific focus on LLMs and how they can be used. The practical examples of AI-powered products in the wild have given us a sense of how we need to start approaching LLMs to be successful in building with them. In the next chapter, I’ll examine the notion of an AI product and what you need to consider before you start building one.
Get LLM Adoption in the Enterprise now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.