Skip to Main Content
Foundations for Architecting Data Solutions
book

Foundations for Architecting Data Solutions

by Ted Malaska, Jonathan Seidman
September 2018
Beginner to intermediate content levelBeginner to intermediate
187 pages
4h 59m
English
O'Reilly Media, Inc.
Content preview from Foundations for Architecting Data Solutions

Chapter 7. Ensuring Data Integrity

When working with open source enterprise data management systems, it’s common to use multiple storage and processing layers in our data architecture, which often means storing data in multiple formats in order to optimize access. This can even mean duplicating data, which in the past might have been viewed as an antipattern because of expense and complexity, but with newer systems and cheap storage, this becomes much more practical.

What doesn’t change is the need to ensure the integrity of the data as it moves through the system from the data sources to the final storage of the data. When we talk about data integrity, we mean being able to ensure that the data is accurate and consistent throughout our data pipelines. To ensure data integrity, it’s critical that we have a known lineage for all data as it moves through the system.

In this chapter, we discuss what it means to ensure data integrity and provide some examples of how to do this as data moves through our system. In this discussion, we consider what we call full fidelity data, which is data that maintains the full context of the source data. This data might be stored in different formats from the source data, but as long as the data can be returned to the original state, we consider it full fidelity. We also consider datasets derived from our original source data; for example, data that’s been filtered and aggregated. Regardless of whether the final datasets are full fidelity or derived, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Engineering with AWS

Data Engineering with AWS

Gareth Eagar

Publisher Resources

ISBN: 9781492038733Errata Page