8 Schema management

This chapter covers

  • Managing schema changes in a cloud data platform
  • Understanding schema-on-read vs. an active a schema-management approach
  • Evaluating when to use schema-as-a-contract vs. a smart-pipeline approach
  • Using Spark to infer schemas in batch mode
  • Implementing a Schema Registry as part of a Metadata layer
  • Using operational metadata to manage schema changes
  • Building resilient data pipelines to manage schema changes automatically
  • Managing schema changes with backward and forward compatibility
  • Managing schema changes through to the data warehouse consumption layer

In this chapter, we will tackle the age-old problem of managing schema changes in a data system introduced when source data changes, exploring how the ...

Get Designing Cloud Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.