Skip to Content
Designing Large Language Model Applications
book

Designing Large Language Model Applications

by Suhas Pai
March 2025
Intermediate to advanced
366 pages
9h 31m
English
O'Reilly Media, Inc.
Content preview from Designing Large Language Model Applications

Chapter 11. Representation Learning and Embeddings

In the previous chapter, we learned how we can interface language models with external tools, including data stores. External data can be present in the form of text files, database tables, and knowledge graphs. Data can span a wide variety of content types, from proprietary domain-specific knowledge bases to intermediate results and outputs generated by LLMs.

If the data are structured, for example residing in a relational database, the language model can issue a SQL query to retrieve the data it needs. But what if the data are present in unstructured form?

One way to retrieve data from unstructured text datasets is to search by keywords or use regular expressions. For the Apple CFO example in the previous chapter, we can retrieve text containing CFO mentions from a corpus containing financial disclosures, hoping that it will contain the join date or tenure information. For instance, you can use the regex:

pattern = r"(?i)\b(?:C\.?F\.?O|Chief\s+Financial\s+Officer)\b"

Keyword search is limited in its effectiveness. There are a very large number of ways to express CFO join date or tenure in a corpus, if it is present at all. Trying to use a catch-all regex like the above could result in a large proportion of false positives.

Therefore, we need to move beyond keyword search. Over the last few decades, the field of information retrieval has developed several methods like BM25 that have shaped search systems. We will learn more about ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Hands-On Large Language Models

Hands-On Large Language Models

Jay Alammar, Maarten Grootendorst

Publisher Resources

ISBN: 9781098150495Errata PageSupplemental Content