book

RAG-Driven Generative AI

by Denis Rothman

September 2024

Beginner to intermediate

338 pages

8h 30m

English

Packt Publishing

Read now

Unlock full access

Who this book is forWhat this book coversTo get the most out of this bookGet in touchMaking the Most Out of This Book – Get to Know Your Free BenefitsUnlock Your Book’s Exclusive BenefitsHow to unlock these benefits in three easy stepsNeed help?
What is RAG?Naïve, advanced, and modular RAG configurationsRAG versus fine-tuningThe RAG ecosystemThe retriever (D)Collect (D1)Process (D2)Storage (D3)Retrieval query (D4)The generator (G)Input (G1)Augmented input with HF (G2)Prompt engineering (G3)Generation and output (G4)The evaluator (E)Metrics (E1)Human feedback (E2)The trainer (T)Naïve, advanced, and modular RAG in codePart 1: Foundations and basic implementation1. Environment2. The generator3. The Data4.The queryPart 2: Advanced techniques and evaluation1. Retrieval metrics2. Naïve RAG3. Advanced RAG4. Modular RAGSummaryQuestionsReferencesFurther reading
From raw data to embeddings in vector storesOrganizing RAG in a pipelineA RAG-driven generative AI pipelineBuilding a RAG pipelineSetting up the environmentThe installation packages and librariesThe components involved in the installation process1. Data collection and preparationCollecting the dataPreparing the data2. Data embedding and storageRetrieving a batch of prepared documentsVerifying if the vector store exists and creating it if notThe embedding functionAdding data to the vector storeVector store information3. Augmented input generationInput and query retrievalAugmented inputEvaluating the output with cosine similaritySummaryQuestionsReferencesFurther reading
Why use index-based RAG?ArchitectureBuilding a semantic search engine and generative agent for drone technologyInstalling the environmentPipeline 1: Collecting and preparing the documentsPipeline 2: Creating and populating a Deep Lake vector storePipeline 3: Index-based RAGUser input and query parametersCosine similarity metricVector store index query engineQuery response and sourceOptimized chunkingPerformance metricTree index query enginePerformance metricList index query enginePerformance metricKeyword index query enginePerformance metricSummaryQuestionsReferencesFurther reading
What is multimodal modular RAG?Building a multimodal modular RAG program for drone technologyLoading the LLM datasetInitializing the LLM query engineLoading and visualizing the multimodal datasetNavigating the multimodal dataset structureSelecting and displaying an imageAdding bounding boxes and saving the imageBuilding a multimodal query engineCreating a vector index and query engineRunning a query on the VisDrone multimodal datasetProcessing the responseSelecting and processing the image of the source nodeMultimodal modular summaryPerformance metricLLM performance metricMultimodal performance metricMultimodal modular RAG performance metricSummaryQuestionsReferencesFurther reading
Adaptive RAGBuilding hybrid adaptive RAG in Python1. Retriever1.1. Installing the retriever’s environment1.2.1. Preparing the dataset1.2.2. Processing the data1.3. Retrieval process for user input2. Generator2.1. Integrating HF-RAG for augmented document inputs2.2. Input2.3. Mean ranking simulation scenario2.4.–2.5. Installing the generative AI environment2.6. Content generation3. Evaluator3.1. Response time3.2. Cosine similarity score3.3. Human user rating3.4. Human-expert evaluationSummaryQuestionsReferencesFurther reading
Scaling with PineconeArchitecturePipeline 1: Collecting and preparing the dataset1. Collecting and processing the datasetInstalling the environment for KaggleCollecting the dataset2. Exploratory data analysis3. Training an ML modelData preparation and clusteringImplementation and evaluation of clusteringPipeline 2: Scaling a Pinecone index (vector store)The challenges of vector store managementInstalling the environmentProcessing the datasetChunking and embedding the datasetChunkingEmbeddingDuplicating dataCreating the Pinecone indexUpsertingQuerying the Pinecone indexPipeline 3: RAG generative AIRAG with GPT-4oQuerying the datasetQuerying a target vectorExtracting relevant textsAugmented promptAugmented generationSummaryQuestionsReferencesFurther reading
The architecture of RAG for knowledge-graph-based semantic searchBuilding graphs from treesPipeline 1: Collecting and preparing the documentsRetrieving Wikipedia data and metadataPreparing the data for upsertionPipeline 2: Creating and populating the Deep Lake vector storePipeline 3: Knowledge graph index-based RAGGenerating the knowledge graph indexDisplaying the graphInteracting with the knowledge graph indexInstalling the similarity score packages and defining the functionsRe-rankingExample metricsMetric calculation and displaySummaryQuestionsReferencesFurther reading
The architecture of dynamic RAGInstalling the environmentHugging FaceChromaActivating session timeDownloading and preparing the datasetEmbedding and upserting the data in a Chroma collectionSelecting a modelEmbedding and storing the completionsDisplaying the embeddingsQuerying the collectionPrompt and retrievalRAG with LlamaDeleting the collectionTotal session timeSummaryQuestionsReferencesFurther reading
The architecture of fine-tuning static RAG dataThe RAG ecosystemInstalling the environment1. Preparing the dataset for fine-tuning1.1. Downloading and visualizing the dataset1.2. Preparing the dataset for fine-tuning2. Fine-tuning the model2.1. Monitoring the fine-tunes3. Using the fine-tuned OpenAI modelMetricsSummaryQuestionsReferencesFurther reading

The architecture of RAG for video productionThe environment of the video production ecosystemImporting modules and librariesGitHubOpenAIPineconePipeline 1: Generator and CommentatorThe AI-generated video datasetHow does a diffusion transformer work?Analyzing the diffusion transformer model video datasetThe Generator and the CommentatorStep 1. Displaying the videoStep 2. Splitting video into framesStep 3. Commenting on the framesPipeline 1 controllerPipeline 2: The Vector Store AdministratorQuerying the Pinecone indexPipeline 3: The Video ExpertSummaryQuestionsReferencesFurther reading
Chapter 1, Why Retrieval Augmented Generation?Chapter 2, RAG Embedding Vector Stores with Deep Lake and OpenAIChapter 3, Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAIChapter 4, Multimodal Modular RAG for Drone TechnologyChapter 5, Boosting RAG Performance with Expert Human FeedbackChapter 6, Scaling RAG Bank Customer Data with PineconeChapter 7, Building Scalable Knowledge-Graph-based RAG with Wikipedia API and LlamaIndexChapter 8, Dynamic RAG with Chroma and Hugging Face LlamaChapter 9, Empowering AI Models: Fine-Tuning RAG Data and Human FeedbackChapter 10, RAG for Video Stock Production with Pinecone and OpenAI

Content preview from RAG-Driven Generative AI

9 Empowering AI Models: Fine-Tuning RAG Data and Human Feedback

An organization that continually increases the volume of its RAG data will reach the threshold of non-parametric data (not pretrained on an LLM). At that point, the mass of RAG data accumulated might become extremely challenging to manage, posing issues related to storage costs, retrieval resources, and the capacity of the generative AI models themselves. Moreover, a pretrained generative AI model is trained up to a cutoff date. The model ignores new knowledge starting the very next day. This means that it will be impossible for a user to interact with a chat model on the content of a newspaper edition published after the cutoff date. That is when retrieval has a key role to play ...