book

Building Machine Learning Systems with a Feature Store

by Jim Dowling

November 2025

Intermediate to advanced

508 pages

14h 13m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Why Did I Write This Book?Target Readers of This BookWhat This Book Is NotOutline of the BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
The Anatomy of a Machine Learning SystemTypes of Machine LearningData SourcesMutable DataA Brief History of Machine Learning SystemsMLOps and LLMOpsA Unified Architecture for AI Systems: Feature, Training, and Inference PipelinesClasses of AI Systems with a Feature StoreML Frameworks and ML Infrastructure Used in This BookSummary
Building ML Systems with ML PipelinesMinimal Viable Prediction ServiceWriting Modular Code for ML PipelinesA Taxonomy for Data Transformations in ML PipelinesFeature Types and Model-Dependent TransformationsReusable Features with Model-Independent TransformationsReal-Time Features with On-Demand TransformationsThe ML Transformation Taxonomy and ML PipelinesFeature PipelinesTraining PipelinesInference PipelinesTitanic Survival as an ML System Built with ML PipelinesSummary
AI System OverviewAir Quality DataExploratory Dataset AnalysisAir Quality DataWeather DataCreating and Backfilling Feature GroupsFeature PipelineTraining PipelineBatch Inference PipelineRunning the PipelinesScheduling the Pipelines as a GitHub ActionBuilding the Dashboard as a GitHub PageFunction Calling with LLMsSummary and Exercises
A Feature Store for Fraud PredictionBrief History of Feature StoresThe Anatomy of a Feature StoreWhen Do You Need a Feature Store?For Context and History in Real-Time ML SystemsFor Time-Series DataFor Improved Collaboration with the FTI Pipeline ArchitectureFor Governance of ML SystemsFor Discovery and Reuse of AI AssetsFor Elimination of Offline-Online Feature SkewFor Centralizing Your Data for AI in a Single PlatformFeature GroupsFeature Groups Store Untransformed Feature DataFeature Definitions and Feature GroupsWriting to Feature GroupsData Models for Feature GroupsDimension Modeling with a Credit Card Data MartReal-Time Credit Card Fraud Detection ML SystemFeature Store Data Model for InferenceOnline InferenceBatch InferenceReading Feature Data with a Feature ViewPoint-in-Time Correct Training Data with Feature ViewsOnline Inference with a Feature ViewSummary and Exercises
Hopsworks ProjectsStoring Files in a ProjectAccess Control Within ProjectsAccess Control at the Cluster Level Using ProjectsFeature GroupsVersioningOnline StoreOffline Store (Lakehouse Tables)Change Data Capture for Feature GroupsFeature ViewsFeature SelectionModel-Dependent TransformationsCreating Feature ViewsTraining Data as Either DataFrames or FilesBatch Inference DataOnline Inference DataFaster Queries for Feature DataSummary and Exercises
Source Code OrganizationFeature PipelinesData Transformations for DataFramesRow Size–Preserving TransformationsRow and Column Size–Reducing TransformationsRow/Column Size–Increasing TransformationsJoin TransformationsDAG of Feature FunctionsLazy DataFramesVectorized Compute, Multicore, and ArrowData TypesCredit Card Fraud FeaturesComposition of TransformationsSummary and Exercises

Feature TransformationsEncoding Categorical VariablesDistributions of Numerical VariablesTransforming Numerical VariablesStoring Transformed Feature Data in a Feature GroupModel-Specific TransformationsOutlier Handling MethodsImputing Missing ValuesData Cleaning as Model-Based TransformationsTarget-/Label-Dependent TransformationsExpensive Features Are Computed When NeededTokenizers and Chat Templates for LLMsTransformations in Scikit-Learn PipelinesTransformations in Feature ViewsOn-Demand TransformationsPyTorch TransformationsUsing pytestUnit TestsA Testing MethodologySummary and Exercises
Batch Feature PipelinesFeature Pipeline Data SourcesBatch Data SourcesStreaming Data SourcesUnstructured Data in Object Stores and FilesystemsAPI and SaaS SourcesSynthetic Credit Card Data with LLMsA Logical Model for the Data Mart and the LLMLLM Prompts to Generate the Synthetic DataBackfilling and Incremental UpdatesPolling and CDC for Incremental DataBackfill and Incremental Processing in One ProgramJob OrchestratorsModalHopsworks JobsWorkflow OrchestratorsAirflowCloud Provider Workflow OrchestratorsData ContractsData Validation with Great Expectations in HopsworksSummary and Exercises
Interactive AI-Enabled Systems Need Real-Time FeaturesEvent-Streaming PlatformsShift Left or Shift Right?Shift-Right ArchitecturesShift-Left ArchitecturesWriting Streaming Feature PipelinesDataflow ProgrammingStateless and Stateful Data TransformationsApache FlinkFelderaWindowed AggregationsRolling AggregationsTime Window AggregationsChoosing the Best Window Type for AggregationsRolling Aggregations with Incremental ViewsCredit Card Fraud Streaming FeaturesASOF Joins and Composition of TransformationsLagged Features and Feature Pipelines in FelderaSummary and Exercises
Unstructured Data and Labels in Feature GroupsSelf-Supervised and Unsupervised LearningSupervised Learning Requires a LabelRoot and Label Feature GroupsFeature SelectionTraining DataSplitting Training DataReproducible Training DataModel TrainingModel ArchitectureCheckpoints to Recover from FailuresHyperparameter Tuning with Ray TuneDistributed Training with RayParameter-Efficient Fine-Tuning of LLMsCredit Card Fraud Model with XGBoostIdentifying Bottlenecks in Distributed TrainingModel Evaluation and Model ValidationModel Performance for Classification and RegressionModel InterpretabilityModel Bias TestsModel File Formats and the Model RegistryModel CardsSummary and Exercises
Batch Inference PipelinesBatch Predictions for a Time RangeBatch Predictions for EntitiesScaling Batch Inference with PySparkData Modeling for Batch InferenceBatch Inference for Neural NetworksBatch Inference for LLMsOnline Inference PipelinesEnsure Offline-Online Consistency for LibrariesModel Deployments with FastAPILLM DeploymentsDeployment API for Models and Feature ViewsModel-Serving Frameworks with KServePerformance and Failure HandlingMixed-Mode UDFsNative UDFs and Log-and-WaitHandling Failures in Online Inference PipelinesModel Deployment SLOsInference with Embedded ModelsEmbedded AI-Enabled ApplicationsStream-Processing AI-Enabled ApplicationsUIs for AI-Enabled Applications in PythonSummary and Exercises
From LLMs to AgentsPrompt ManagementPrompt EngineeringContext WindowAgents and Workflows with LlamaIndexRetrieval-Augmented GenerationRetrieval with a Document StoreRetrieval with a Feature StoreRetrieval with a Graph DatabaseTools and Function-Calling LLMsModel Context ProtocolAgent-to-Agent (A2A) ProtocolFrom LLM Workflows to AgentsPlanningSecurity ChallengesDomain-Specific (Intermediate) RepresentationsA Development Process for AgentsAgent Deployments in HopsworksSummary and Exercises
Offline TestingFrom Dev to ProdAutomatic Containerization and JobsEnvironments and Jobs in HopsworksModal JobsCI/CD Tests for AI SystemsFeature Pipeline TestsTraining Pipeline Tests for Model Performance and BiasTesting Model DeploymentsA/B Tests for Batch InferenceEvals for AgentsGovernanceSchematized TagsLineageVersioningAudit LogsSummary and Exercises
Logging and Metrics for ML ModelsLogging for Batch and Online ModelsMetrics for Online ModelsMetrics for Batch ModelsMonitoring Features and ModelsData Ingestion DriftUnivariate Feature DriftMultivariate Feature DriftMonitoring Vector EmbeddingsModel Monitoring with NannyMLWhen to Retrain or Redesign a ModelLogging and Metrics for AgentsFrom Logs to Traces with AgentsError AnalysisGuardrailsOnline A/B TestingJailbreaking and Prompt InjectionLLM MetricsSummary and Exercises
Introduction to RecommendersA TikTok Recommender with the Retrieval-and-Ranking ArchitectureReal-Time Personalized RecommenderFeature PipelinesTraining PipelinesOnline Inference PipelineAgentic Search for VideosThe Dirty Dozen of Fallacies of MLOpsThe Ethical Responsibilities of AI BuildersSummary

Content preview from Building Machine Learning Systems with a Feature Store

Chapter 13. Testing AI Systems

MLOps is a set of best practices for the automated testing, versioning, and monitoring of the ML pipelines and ML assets that power our AI systems. We introduced MLOps in Chapter 1, data validation tests in Chapter 6, and unit testing for transformation functions in Chapter 7. But there is still much more ground to cover. If you are to build a reliable, governed, maintainable AI system, you need integration tests for each of your ML pipelines, run both during development and before deployment. We will look at how to write feature pipeline tests and model validation tests and how to test model deployments. We will look at how to reliably package our ML pipelines with automatic containerization in development, staging, and production environments. We will also present offline testing of agents and LLM workflows with evals.

Testing is key to building a high-quality AI system. Your testing should be at a level where you are so confident in your tests that you will deploy to production on a Friday. And even if an upgrade fails, you will be easily able to roll back your changes. In the next chapter we will focus on operational concerns of MLOps, but in this chapter, we will look at tests run during development and how to automate offline testing for AI systems.

Offline Testing

The starting point for building reliable AI systems is testing. AI systems require more levels of testing than traditional software systems. Small bugs in data or code can easily ...