Tutorial 2: A YouTube Video Summarizer Using Whisper and LangChain

Split the documents into chunks and compute their embeddings

Load the documents from the provided URLs and split them into chunks using the CharacterTextSplitter with a chunk size of 1000 and no overlap:

# use the selenium scraper to load the documentsloader = SeleniumURLLoader(urls=urls)docs_not_splitted = loader.load()# we split the documents into smaller chunkstext_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)docs = text_splitter.split_documents(docs_not_splitted)

Next, obtain the embeddings using OpenAIEmbeddings and store them in a cloud-based Deep Lake vector database. In a real-world project, one might upload a whole website or course to Deep Lake to search across thousands or millions of documents. Utilizing a cloud ...

Get Building LLMs for Production now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Building LLMs for Production by Louis-Francois Bouchard, Louie Peters

Split the documents into chunks and compute their embeddings

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly