Chapter 4. Memory: Enabling Your Chatbot to Learn from Interactions

In the previous chapter, you learned how to provide your AI chatbot application with up-to-date, relevant context. This enables your chatbot to generate accurate responses based on the user’s input. But that’s not enough to build a production-ready application. How do we enable your application to actually “chat” back and forth with the user, whilst remembering prior conversations and relevant context?

Large language models are stateless, and so each time the model is prompted to generate a new response it has no memory of the prior prompt or model response. In order to provide this historical information to the model, we need a robust memory system that will keep track of previous conversations and context. This historical information can then be included in the final prompt sent to the LLM, thus giving it “memory.” Figure 4-1 illustrates this.

Memory and retrieval used to generate context aware answers from an LLM.
Figure 4-1. Memory and retrieval used to generate context-aware answers from an LLM.

In this chapter, you’ll learn how to build this essential memory system using LangChain’s built-in modules to make this development process easier.

How to Build a Chatbot Memory System

There are two core design decisions behind any robust memory system.

  1. How state is stored.

  2. How state is queried.

A simple way to build a chatbot memory system that incorporates effective solutions to these design decisions is to store and reuse the history of all chat interactions between the user and the model. The state of this memory system can be:

  • Tracked as a list of messages.

  • Updated by appending recent messages after each turn.

  • Appended into the prompt by inserting the messages into the prompt.

Figure 4-2 illustrates this simple memory system.

A simple memory system which utilizes chat history in prompts to generate model answers.
Figure 4-2. A simple memory system which utilizes chat history in prompts to generate model answers.

Here’s a code example that illustrates a simple version of this memory system using LangChain. First in Python:

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer all questions to the best of your ability."),
    ("placeholder", "{messages}"),
])
model = ChatOpenAI()
chain = prompt | model
chain.invoke({
    "messages": [
        ("human","Translate this sentence from English to French: I love programming."),
        ("ai", "J'adore la programmation."),
        ("human", "What did you just say?"),
    ],
})

And in JS:

import {ChatPromptTemplate} from '@langchain/core/prompts'
import {ChatOpenAI} from '@langchain/openai'
const prompt = ChatPromptTemplate.fromMessages([
    ["system", "You are a helpful assistant. Answer all questions to the best of your ability."],
    ["placeholder", "{messages}"],
])
const model = new ChatOpenAI()
const chain = prompt.pipe(model)
await chain.invoke({
    "messages": [
        ["human","Translate this sentence from English to French: I love programming."],
        ["ai", "J'adore la programmation."],
        ["human", "What did you just say?"],
    ],
})

Note how the incorporation of the previous conversation in the chain enabled the model to answer the follow-up question in a context-aware manner.

Whilst this may work for demo purposes, it won’t scale in a production environment because the list of conversation messages can grow significantly. Fortunately, LangChain provides a core utility class called ChatMessageHistory, which makes it easier to implement this memory system.

Below is a code example that illustrates how this class stores and retrieves chat messages.

from langchain_core.chat_history import InMemoryChatMessageHistory
chat_history = InMemoryChatMessageHistory()
chat_history.add_user_message("Translate this sentence from English to French: I love programming.")
chat_history.add_ai_message("J'adore la programmation.")
chat_history.messages

And in JS:

import {InMemoryChatMessageHistory} from '@langchain/core/chat_history'
const chatHistory = new InMemoryChatMessageHistory()
await chatHistory.addUserMessage("Translate this sentence from English to French: I love programming.")
await chatHistory.addAIMessage("J'adore la programmation.")
await chatHistory.getMessages()

And the output:

[HumanMessage(content='hi!', additional_kwargs={}),
 AIMessage(content='whats up?', additional_kwargs={})]

We can then integrate the stored chat messages into our chain and send a final prompt to the model, first in Python:

response = chain.invoke({
    "messages": chat_history.messages,
})
chat_history.add_ai_message(response)
input2 = "What did I just ask you?"
chat_history.add_user_message(input2)
chain.invoke({
    "messages": chat_history.messages,
})

And in JS:

const response = await chain.invoke({
    "messages": await chatHistory.getMessages(),
})
await chatHistory.addAIMessage(response)
const input2 = "What did I just ask you?"
await chatHistory.addUserMessage(input2)
await chain.invoke({
    "messages": await chatHistory.getMessages(),
})

And the output:

AIMessage(content='You just asked me to translate the sentence "I love programming" from English to French.', response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 61, 'total_tokens': 79}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5cbb21c2-9c30-4031-8ea8-bfc497989535-0', usage_metadata={'input_tokens': 61, 'output_tokens': 18, 'total_tokens': 79})

Automatic history management

In the previous example, we integrated the chat messages into the chain explicitly but this requires the tedious manual management of each new message. In a production setting, we need a way to persist chat history and automate the insertion and updating of it.

To solve this problem, we can utilize LangChain’s RunnableWithMessageHistory class to automatically insert and update chat messages. This class is a wrapper for a LangChain expression language chain alongside BaseChatMessageHistory, which is another class that handles injecting chat history into inputs and updating it after each invocation. Here’s a code example illustration of how this works. First, let’s modify our prompt to incorporate a chat_history parameter which will later contain all prior chat messages, first in Python:

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer all questions to the best of your ability."),
    ("placeholder", "{history}"),
    ("human", "{input}"),
])
chain = prompt | model

And in JS:

const prompt = ChatPromptTemplate.fromMessages([
    ["system", "You are a helpful assistant. Answer all questions to the best of your ability."],
    ["placeholder", "{history}"],
    ["human", "{input}"],
])
const chain = prompt.pipe(model)

Next, let’s use the RunnableWithMessageHistory class to wrap our chain and incorporate the latest user input and chat history.

from langchain_core.runnables.history import RunnableWithMessageHistory
chat_history_for_chain = InMemoryChatMessageHistory()
chain_with_message_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: chat_history_for_chain,
    input_messages_key="input",
    history_messages_key="history",
)

And in JS:

import {RunnableWithMessageHistory} from '@langchain/core/runnables'
const chatHistoryForChain = new InMemoryChatMessageHistory()
const chainWithHistory = new RunnableWithMessageHistory({
    runnable: chain,
    getMessageHistory: () => chatHistoryForChain,
    inputMessagesKey: 'input',
    historyMessagesKey: 'history'
})

As you can see, the RunnableWithMessageHistory class takes a few parameters in addition to the chain that we want to wrap:

  • A factory function that returns a message history for a given session_id. Session_id is an identifier for the session (conversation) thread that the input messages correspond to. This allows you to maintain several conversations or threads with the same chain at the same time.

  • This allows your chain to handle multiple users at once by loading different messages for different conversations.

  • An input_messages_key that specifies which part of the input should be tracked and stored in the chat history. In this example, we want to track the string passed in as input.

  • A history_messages_key that specifies what the previous messages should be injected into the prompt as. Our prompt has a MessagesPlaceholder named chat_history, so we specify this property to match.

  • For chains with multiple outputs, it also includes an output_messages_key which specifies which output to store as history. This is the inverse of input_messages_key.

Specifically, RunnableWithMessageHistory loads previous messages in the conversation before passing it to the Runnable, and it saves the generated response as a message after calling the runnable.

A red and green rectangle with text

Description automatically generated
Figure 4-3. Add caption here.

Let’s look at an example where we return a chat history corresponding to each session:

The configuration parameters used to track chat history can be customized by passing in a list of ConfigurableFieldSpec objects to the history_factory_config parameter. Below, we use two parameters: a user_id and conversation_id in a code example. First in Python:

from langchain_core.runnables.history import RunnableWithMessageHistory
# the chain we used before
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer all questions to the best of your ability."),
    ("placeholder", "{history}"),
    ("human", "{input}"),
])
model = ChatOpenAI()
chain = prompt | model
# keep track of history for each combo of user_id and conversation_id
histories: dict[str, InMemoryChatMessageHistory] = {}
def get_session_history(session_id: str = ''):
    if session_id not in histories:
        histories[session_id] = InMemoryChatMessageHistory()
    return histories[session_id]    
# chain with history
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)
# in action
with_message_history.invoke(
    {"input": "hi im bob!"},
    config={"configurable": {"sesion_id": "123"}},
)
# AIMessage(content='Hello Bob! How can I assist you today?')
# remembers
with_message_history.invoke(
    {"input": "whats my name?"},
    config={"configurable": {"session_id": "123"}},
)
# AIMessage(content='Your name is Bob. How can I help you today, Bob?')
# New session_id --> does not remember
with_message_history.invoke(
    {"input": "whats my name?"},
    config={"configurable": {"session_id": "456"}},
)
# AIMessage(content='I'm sorry, but I don't have access to your personal information such as your name. How can I assist you today?')

And in JS:

import {RunnableWithMessageHistory} from '@langchain/core/runnables'
// the chain we used before
const prompt = ChatPromptTemplate.fromMessages([
    ["system", "You are a helpful assistant. Answer all questions to the best of your ability."],
    ["placeholder", "{history}"],
    ["human", "{input}"],
])
const model = new ChatOpenAI()
const chain = prompt.pipe(model)
// keep track of history for each combo of user_id and conversation_id
const histories = Object.create(null)
function getMessageHistory(sessionId) {
    if (!(sessionId in histories)) {
        histories[sessionId] = new InMemoryChatMessageHistory()
    }
    return histories[sessionId]
}
const chainWithHistory = new RunnableWithMessageHistory({
    runnable: chain,
    getMessageHistory,
    inputMessagesKey: 'input',
    historyMessagesKey: 'history',
})
// in action
await chainWithHistory.invoke(
    {"input": "hi im bob!"},
    {"configurable": {"sessionId": "123"}},
)
// AIMessage(content='Hello Bob! How can I assist you today?')
// remembers
await chainWithHistory.invoke(
    {"input": "whats my name?"},
    {"configurable": {"sessionId": "123"}},
)
// AIMessage(content='Your name is Bob. How can I help you today, Bob?')
// New user_id --> does not remember
await chainWithHistory.invoke(
    {"input": "whats my name?"},
    {"configurable": {"sessionId": "456"}},
)
// AIMessage(content='I'm sorry, but I don't have access to your personal information such as your name. How can I assist you today?')
Note

Note that in this case the context was preserved for the same session_id, but once we changed it, the new chat history was started.

How to Modify Chat History

In many cases, the chat history messages aren’t in the best state or format to generate an accurate response from the model. To overcome this problem, we can modify the chat history in a variety of ways.

Trimming messages

LLMs have limited context windows, therefore, the final prompt sent to the model can’t exceed the model’s input token limits. In addition, excessive prompt information can distract the model and lead to hallucination.

An effective solution to this problem is to limit the number of messages retrieved from chat history and appended to the prompt. In practice, we need only to load and store the most recent chat n history messages. Let’s use an example chat history with some preloaded messages:

Whilst our custom trim_messages function is useful, it may not be flexible enough to handle a variety of message trimming requirements. Fortunately, LangChain provides a built-in trim_messages helper that incorporates various strategies to meet these requirements. For example, the trimmer helper enables us to specify how many tokens we want to keep or remove from chat history.

Here’s an example where we retrieve the last max_tokens in the list of messages by setting a strategy parameter to “last”.

from langchain_core.messages import SystemMessage, trim_messages
from langchain_openai import ChatOpenAI
trimmer = trim_messages(
    max_tokens=65,
    strategy="last",
    token_counter=ChatOpenAI(model="gpt-4o"),
    include_system=True,
    allow_partial=False,
    start_on="human",
)
messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]
trimmer.invoke(messages)
And in JS:
import {
  AIMessage,
  HumanMessage,
  SystemMessage,
  trimMessages,
} from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
const trimmer = trimMessages({
  maxTokens: 65,
  strategy: "last",
  tokenCounter: new ChatOpenAI({ modelName: "gpt-4o" }),
  includeSystem: true,
  allowPartial: false,
  startOn: "human",
});
const messages = [
    new SystemMessage("you're a good assistant"),
    new HumanMessage("hi! I'm bob"),
    new AIMessage("hi!"),
    new HumanMessage("I like vanilla ice cream"),
    new AIMessage("nice"),
    new HumanMessage("whats 2 + 2"),
    new AIMessage("4"),
    new HumanMessage("thanks"),
    new AIMessage("no problem!"),
    new HumanMessage("having fun?"),
    new AIMessage("yes!"),
]
const trimmed = await trimmer.invoke(messages);

And the output:

[SystemMessage(content="you're a good assistant"),
 HumanMessage(content='whats 2 + 2'),
 AIMessage(content='4'),
 HumanMessage(content='thanks'),
 AIMessage(content='no problem!'),
 HumanMessage(content='having fun?'),
 AIMessage(content='yes!')]

Note the following:

  • The parameter strategy controls whether to start from the beginning or the end of the list. Usually you’ll want to prioritize the most recent messages, and cut older messages if they don’t fit, that is, start from the end of the list. For this behavior choose the value last. The other available option is first, which would prioritize the oldest messages and cut more recent messages if they don’t fit.

  • The token_counter is an LLM or Chat Model, which will be used to count tokens using the tokenizer appropriate to that model

  • We can add the parameter include_system=True to ensure that the trimmer keeps the system message.

  • The parameter allow_partial determines whether to cut the last message’s content to fit within the limit. In our example, we set this to false, which completely removes the message that would send the total over the limit.

  • The parameter start_on=”human” ensures that we never remove an AIMessage (that is a response from the model) without also removing corresponding HumanMessage (ie the question for that response).

Now, let’s incorporate the trimmer into a chain and RunnableWithMessageHistory. To use it in the chain, we need to ensure that the trimmer is run before the messages input to our prompt.

from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer all questions to the best of your ability."),
    ("placeholder", "{messages}"),
])
model = ChatOpenAI()
# this makes a "messages" key available to prompt,
# after passing the input messages list through the trimmer 
chain = {"messages": trimmer} | prompt | model
# tracking history
history = InMemoryChatMessageHistory()
with_message_history = RunnableWithMessageHistory(
    chain, lambda: history
)
# using it
with_message_history.invoke(
[HumanMessage(content="whats my name?")]
)

And in JS:

import {RunnableWithMessageHistory, RunnableMap} from '@langchain/core/runnables'
const prompt = ChatPromptTemplate.fromMessages([
    ["system", "You are a helpful assistant. Answer all questions to the best of your ability."],
    ["placeholder", "{messages}"],
])
const model = new ChatOpenAI()
// this makes a "messages" key available to prompt,
// after passing the input messages list through the trimmer 
const chain = RunnableMap.from({messages: trimmer}).pipe(prompt).pipe(model)
// tracking history
const history = new InMemoryChatMessageHistory()
const chainWithHistory = new RunnableWithMessageHistory({
    runnable: chain,
    getMessageHistory: () => history,
})
// in action
await chainWithHistory.invoke(
    [new HumanMessage("hi im bob!")],
    {configurable: {sessionId: 'abc'}},
)

Summary memory

Aside from trimming messages, we can utilize the LLM to generate a summary of the conversation and then incorporate this summary into the prompt sent to the model.

Here’s an example:

demo_ephemeral_chat_history = ChatMessageHistory()
demo_ephemeral_chat_history.add_user_message("Hey there! I'm Nemo.")
demo_ephemeral_chat_history.add_ai_message("Hello!")
demo_ephemeral_chat_history.add_user_message("How are you today?")
demo_ephemeral_chat_history.add_ai_message("Fine thanks!")
demo_ephemeral_chat_history.messages
[HumanMessage(content="Hey there! I'm Nemo."),
 AIMessage(content='Hello!'),
 HumanMessage(content='How are you today?'),
 AIMessage(content='Fine thanks!')]
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability. The provided chat history includes facts about the user you are speaking with.",
        ),
        MessagesPlaceholder(variable_name="chat_history"),
        ("user", "{input}"),
    ]
)
chain = prompt | chat
chain_with_message_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: demo_ephemeral_chat_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

Next, let’s create a function that will distill previous interactions into a summary. We can add this one to the front of the chain too.

def summarize_messages(chain_input):
    stored_messages = demo_ephemeral_chat_history.messages
    if len(stored_messages) == 0:
        return False
    summarization_prompt = ChatPromptTemplate.from_messages(
        [
            MessagesPlaceholder(variable_name="chat_history"),
            (
                "user",
                "Distill the above chat messages into a single summary message. Include as many specific details as you can.",
            ),
        ]
    )
    summarization_chain = summarization_prompt | chat
    summary_message = summarization_chain.invoke({"chat_history": stored_messages})
    demo_ephemeral_chat_history.clear()
    demo_ephemeral_chat_history.add_message(summary_message)
    return True
chain_with_summarization = (
    RunnablePassthrough.assign(messages_summarized=summarize_messages)
    | chain_with_message_history
)

Now, let’s invoke the chain and see if it remembers the chat history.

chain_with_summarization.invoke(

    {"input": "What did I say my name was?"},
    {"configurable": {"session_id": "unused"}},
)
### 
AIMessage(content='You introduced yourself as Nemo. How can I assist you today, Nemo?')
###
demo_ephemeral_chat_history.messages
[AIMessage(content='The conversation is between Nemo and an AI. Nemo introduces himself and the AI responds with a greeting. Nemo then asks the AI how it is doing, and the AI responds that it is fine.'),
 HumanMessage(content='What did I say my name was?'),
 AIMessage(content='You introduced yourself as Nemo. How can I assist you today, Nemo?')]

Note that invoking the chain again will generate another summary generated from the initial summary in addition to a new message.

In practice, it’s possible to design a hybrid approach that incorporates the summary and trimmer strategies, where a certain number of messages are retained in chat history whilst others are summarized.

Filtering messages

As the list of chat history messages grows, a wider variety of types, sub-chains, and models may be utilized. LangChain provides a filter_messages helper that makes it easier to filter the chat history messages by type, id, or name.

Here’s an example where we filter for human messages, first in Python:

from langchain_core.messages import (
    AIMessage,
    HumanMessage,
    SystemMessage,
    filter_messages,
)
messages = [
    SystemMessage("you are a good assistant", id="1"),
    HumanMessage("example input", id="2", name="example_user"),
    AIMessage("example output", id="3", name="example_assistant"),
    HumanMessage("real input", id="4", name="bob"),
    AIMessage("real output", id="5", name="alice"),
]
filter_messages(messages, include_types="human")

And in JS:

import {
  HumanMessage,
  SystemMessage,
  AIMessage,
  filterMessages,
} from "@langchain/core/messages";
const messages = [
  new SystemMessage({content: "you are a good assistant", id: "1"}),
  new HumanMessage({content: "example input", id: "2", name: "example_user"}),
  new AIMessage({content: "example output", id: "3", name: "example_assistant"}),
  new HumanMessage({content: "real input", id: "4", name: "bob"}),
  new AIMessage({content: "real output", id: "5", name: "alice"}),
];
filterMessages(messages, { includeTypes: ["human"] });

And the output:

[HumanMessage(content='example input', name='example_user', id='2'),
HumanMessage(content='real input', name='bob', id='4')]

Let’s try another example where we filter to exclude users and ids, and include message types, first in Python:

filter_messages(messages, exclude_names=["example_user", "example_assistant"])
"""
[SystemMessage(content='you are a good assistant', id='1'),
HumanMessage(content='real input', name='bob', id='4'),
AIMessage(content='real output', name='alice', id='5')]
"""
filter_messages(messages, include_types=[HumanMessage, AIMessage], exclude_ids=["3"])
"""
[HumanMessage(content='example input', name='example_user', id='2'),
 HumanMessage(content='real input', name='bob', id='4'),
 AIMessage(content='real output', name='alice', id='5')]
"""

And in JS:

filterMessages(messages, { excludeNames: ["example_user", "example_assistant"] });
/*
[SystemMessage(content='you are a good assistant', id='1'),
HumanMessage(content='real input', name='bob', id='4'),
AIMessage(content='real output', name='alice', id='5')]
*/
filterMessages(messages, { includeTypes: ["human", "ai"], excludeIds: ["3"] });
/*
[HumanMessage(content='example input', name='example_user', id='2'),
 HumanMessage(content='real input', name='bob', id='4'),
 AIMessage(content='real output', name='alice', id='5')]
*/

The filter_messages helper can also be used imperatively (as above) or declaratively, making it easy to compose with other components in a chain:

model = ChatOpenAI()
filter_ = filter_messages(exclude_names=["example_user", "example_assistant"])
chain = filter_ | model

And in JS:

const model = new ChatOpenAI()
const filter = filterMessages({
  excludeNames: ["example_user", "example_assistant"]
})
const chain = filter.pipe(model)

Merging consecutive messages

There are certain models that do not support inputs including consecutive messages of the same type, for instance Anthropic chat models. LangChain provides a merge_message_runs utility that makes it easy to merge consecutive messages of the same type.

from langchain_core.messages import (
    AIMessage,
    HumanMessage,
    SystemMessage,
    merge_message_runs,
)
messages = [
    SystemMessage("you're a good assistant."),
    SystemMessage("you always respond with a joke."),
    HumanMessage([{"type": "text", "text": "i wonder why it's called langchain"}]),
    HumanMessage("and who is harrison chasing anyways"),
    AIMessage(
        'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
    ),
    AIMessage("Why, he's probably chasing after the last cup of coffee in the office!"),
]
merge_message_runs(messages)

And in JS:

import {
  HumanMessage,
  SystemMessage,
  AIMessage,
  mergeMessageRuns,
} from "@langchain/core/messages";
const messages = [
  new SystemMessage("you're a good assistant."),
  new SystemMessage("you always respond with a joke."),
  new HumanMessage({
    content: [{ type: "text", text: "i wonder why it's called langchain" }],
  }),
  new HumanMessage("and who is harrison chasing anyways"),
  new AIMessage(
    'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
  ),
  new AIMessage(
    "Why, he's probably chasing after the last cup of coffee in the office!"
  ),
];
mergeMessageRuns(messages);

And the output:

[SystemMessage(content="you're a good assistant.\nyou always respond with a joke."),
 HumanMessage(content=[{'type': 'text', 'text': "i wonder why it's called langchain"}, 'and who is harrison chasing anyways']),
 AIMessage(content='Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!\nWhy, he\'s probably chasing after the last cup of coffee in the office!')]

Notice that if the contents of one of the messages to merge is a list of content blocks, then the merged message will have a list of content blocks. And if both messages to merge have string contents, then those are concatenated with a newline character.

The merge_message_runs helper can be used in imperatively (as above) or declaratively, making it easy to compose with other components in a chain, first in Python:

model = ChatOpenAI()
merger = merge_message_runs()
chain = merger | model

And in JS:

const model = new ChatOpenAI()
const merger = mergeMessageRuns()
const chain = merger.pipe(model)

Chat history with retrieval

So far the examples we’ve explored incorporate chat history in a simple back and forth between the user and the model without context of external data. However, as discussed in the previous chapter, retrieval of external data used as context is crucial to ensure that the model’s responses are up-to-date and accurate. Figure 4-4 illustrates a complete application design that incorporates retrieval and chat history.

A combination of retrieval and chat history mechanisms to generate a final output.
Figure 4-4. A combination of retrieval and chat history mechanisms to generate a final output.

Here’s a code example that ties together chat history with a retrieval strategy you learned in the previous chapter. First, let’s load, split, and embed our data source in a vector store. First in Python:

from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_postgres.vectorstores import PGVector
# Load the document, split it into chunks
raw_documents = TextLoader('./test.txt').load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(raw_documents)
# embed each chunk and insert it into the vector store
model = OpenAIEmbeddings()
connection = 'postgresql+psycopg://langchain:langchain@localhost:6024/langchain'
db = PGVector.from_documents(documents, model, connection=connection)
retriever = db.as_retriever()

And in JS:

import { TextLoader } from "langchain/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";
// Load the document, split it into chunks
const loader = new TextLoader("./test.txt");
const raw_docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const docs = await splitter.splitDocuments(docs)
// embed each chunk and insert it into the vector store
const model = new OpenAIEmbeddings();
const db = await PGVectorStore.fromDocuments(docs, model, {
  postgresConnectionOptions: {
    connectionString: 'postgresql://langchain:langchain@localhost:6024/langchain'
  }
})
const retriever = db.asRetriever()

Chapter 2 has more details on the indexing stage.

Next, let’s define a sub-chain that takes historical chat messages and the latest user question, and reformulates the question if it makes reference to any information in the historical information. We’ll then use this sub-chain inside the final RAG chain, which will, in order,

  1. Rephrase the user’s question given the conversation history (if there is history)

  2. Pass the rephrased question to the retriever (see above) to get the most relevant documents

  3. Pass the original question, chat history and documents to the final prompt to generate an answer.

We’ll use a prompt that includes a placeholder variable under the name chat_history. This allows us to pass in a list of messages to the prompt using the chat_history input key, and these messages will be inserted after the system message and before the human message containing the latest question.

from langchain_openai import ChatOpenAI
from langchain_core.runnables import chain
from langchain_core.prompts import ChatPromptTemplate
def get_msg_content(msg):
    return msg.content
contextualize_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)
contextualize_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_system_prompt),
    ("placeholder", "{chat_history}"),
    ("human", "{input}"),
])
contextualize_chain = (
    contextualize_prompt
    | ChatOpenAI()
    | get_msg_content
)
qa_system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)
qa_chain = (
    qa_prompt
    | ChatOpenAI()
    | get_msg_content
) 
@chain
def history_aware_qa(input):
     # rephrase question if needed
     if input.get('chat_history'):
         question = contextualize_chain.invoke(input)
     else:
         question = input['input']
     # get context from retriever
     context = retriever.invoke(question)
     # get answer
     return qa_chain.invoke({
         **input,
         "context": context
     })

And in JS:

import {ChatOpenAI} from '@langchain/openai'
import {RunnableLambda} from '@langchain/core/runnables'
import {ChatPromptTemplate} from '@langchain/core/prompts'
function getMsgContent(msg) {
  return msg.content
}
contextualizeSystemPrompt = `Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is.`
contextualizePrompt = ChatPromptTemplate.fromMessages([
    ["system", contextualizeSystemPrompt],
    ["placeholder", "{chat_history}"],
    ["human", "{input}"],
])
contextualizeChain = contextualizePrompt
  .pipe(ChatOpenAI())
  .pipe(getMsgContent)
qaSystemPrompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)
qaPrompt = ChatPromptTemplate.fromMessages([
    ["system", qaSystemPrompt],
    ["placeholder", "{chat_history}"],
    ["human", "{input}"],
])
qaChain = qaPrompt
  .pipe(ChatOpenAI())
  .pipe(getMsgContent) 
historyAwareQa = RunnableLambda.from(input => {
  // rephrase question if needed
  const question = input.chat_history
    ? await contextualizeChain.invoke(input)
    : input.input
  // get context from retriever
  const context = await retriever.invoke(question)
  // get answer
  return qa_chain.invoke({ ...input, context})
}

Next, let’s incorporate stateful management of chat history and send the final prompt, including chat history and retrieved context to the model for an output.

from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory
chat_history_for_chain = InMemoryChatMessageHistory()
qa_with_history = RunnableWithMessageHistory(
    history_aware_qa,
    lambda _: chat_history_for_chain,
    input_messages_key="input",
    history_messages_key="chat_history",
)
qa_with_history.invoke(
    {"input": "What is Task Decomposition?"},
    config={"configurable": {"session_id": "123"}},
)

And in JS:

import {RunnableWithMessageHistory} from '@langchain/core/runnables'
import {InMemoryChatMessageHistory} from '@langchain/core/chat_history'
const chatHistoryForChain = new InMemoryChatMessageHistory()
const qaWithHistory = new RunnableWithMessageHistory({
    runnable: historyAwareQa,
    getMessageHistory: () => chatHistoryForChain,
    inputMessagesKey: 'input',
    historyMessagesKey: 'chat_history'
})
await qa_with_history.invoke(
    {input: "What is Task Decomposition?"},
    config={"configurable": {"session_id": "123"}},
)

Persisting Chat History Long-Term

In most production use cases, it’s necessary to persist the chat history outside the application. Fortunately, LangChain provides a wide variety of memory integrations that incorporate the chat messages. Next we’ll show an example using Postgres as the backing store, but LangChain offers many more implementations, such as Redis or SQLite, among others.

Postgres

Postgres is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance.

Here’s an example of how to use Postgres to store chat message history.

from langchain_postgres import PostgresChatMessageHistory
history = PostgresChatMessageHistory(
   connection_string='postgresql://langchain:langchain@localhost:6024/langchain',
   session_id="example",
)
history.add_user_message("hi!")
history.add_ai_message("whats up?")
history.messages
And in JS:
import {PostgresChatMessageHistory} from '@langchain/community/stores/message/postgres'
import pg from "pg";
const history = new PostgresChatMessageHistory({
   tableName: "langchain_chat_histories",
   sessionId: "example",
   pool: new pg.Pool({
     host: "127.0.0.1",
     port: 6024,
     user: "langchain",
     password: "langchain",
     database: "langchain",
   }),
});
await chatHistory.addUserMessage("hi!")
await chatHistory.addAIMessage("whats up?")
await chatHistory.getMessages()

Summary

This chapter covered the fundamentals of building a simple memory system that enables your AI chatbot to remember its conversations with a user. We discussed how to automate the storage and updating of chat history using LangChain’s in-built modules to make this easier. We also discussed the importance of modifying chat history and explored various strategies to trim, filter, and summarize chat messages.

Finally, we explored an end-to-end example of incorporating a memory system with retrieval augmented generation to build an AI chatbot that generates outputs based on both prior conversations and retrieved context.

At this point, your AI chatbot should have the ability to “chat” accurately with a user using up-to-date information. This is typically enough for most question-answer chatbot use cases. But what if you want to make your AI chatbot more ‘human-like’, thinking and acting autonomously? This can unlock more advanced features and use cases that can enhance the user experience of your application.

In the next chapter, you’ll learn how to enable your AI chatbot to act like an “agent” that thinks and acts based on your guidance.

Get Learning LangChain now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.