Chapter 3. Securing Data for the AI Era

Current AI systems leverage both structured and unstructured data to provide insightful responses. One effective method to harness internal company data is through retrieval-augmented generation (RAG) systems. RAG integrates retrieval and generative models to improve the accuracy of AI responses by incorporating domain-specific knowledge. This process involves two main steps: retrieving relevant data from internal sources and using a generative model to produce contextually accurate outputs.

Why It Matters

Generative AI (GenAI), especially RAG, has transformed enterprise data utilization. AI copilot tools simplify access to data, allowing even nontechnical users to query enterprise data without having specialized skills like SQL.

RAG systems offer a cost-effective alternative to building or fine-tuning models from scratch. They enable the incorporation of proprietary data so as to provide more precise and relevant responses than those generated by generic models.

The retrieval module obtains information relevant to the user’s question. This information is stored in a vector database, and then the augmentation module appends the retrieved content that is relevant to the user query. Finally, the generation module provides more precise, cited, and contextual responses using the most up-to-date information.

However, the rapid adoption of these systems brings some challenges. Their ease of access democratizes data analysis, but it also increases ...

Get Data Security Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.