Chapter 2. Indexing: Preparing Your Documents for LLMs

In the previous chapter, you learned about the important building blocks used to create an LLM application using LangChain. You also built a simple AI chatbot consisting of a prompt sent to the model and the output generated by the model. But there are major limitations to this simple chatbot.

What if your use case requires knowledge the model wasn’t trained on? For example, let’s say you want to use AI to ask questions about a company, but the information is stored in PDF documents, or in documents that are private to you or your company. While we’ve seen model providers enriching their training datasets to include more and more of the world’s public information (no matter what format it is stored in), two major limitations continue to exist in LLM’s knowledge corpus:

Private data

Information that isn’t publicly available is, by definition, not included in the training data ...

Get Learning LangChain now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.