Split the documents into chunks and compute their embeddings
Load the documents from the provided URLs and split them into chunks using the CharacterTextSplitter with a chunk size of 1000 and no overlap:
# use the selenium scraper to load the documentsloader = SeleniumURLLoader(urls=urls)docs_not_splitted = loader.load()# we split the documents into smaller chunkstext_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)docs = text_splitter.split_documents(docs_not_splitted)
Next, obtain the embeddings using OpenAIEmbeddings and store them in a cloud-based Deep Lake vector database. In a real-world project, one might upload a whole website or course to Deep Lake to search across thousands or millions of documents. Utilizing a cloud ...
Get Building LLMs for Production now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.