Skip to Content
Data Governance with AWS
book

Data Governance with AWS

by Kevin Lewis, Jason Berkowitz, Ina Felsheim, Joseph D. Stec
April 2024
Intermediate to advanced
52 pages
1h 12m
English
O'Reilly Media, Inc.
Content preview from Data Governance with AWS

Chapter 2. Curating Your Data

Academics define data curation as “the act of discovering a data source(s) of interest, cleaning and transforming the new data, semantically integrating it with other local data sources, and deduplicating the resulting composite.”1

CDOs think of data curation more broadly as the strategic and systematic process of organizing, managing, and maintaining data to ensure the quality, integrity, and usability of data across the enterprise to meet the needs of a variety of business use cases and applications, from basic reporting to advanced ML and AI.

Both parties agree that data curation involves data collection, validation, transformation, storage, preservation, and dissemination. From the practical perspective of the C-suite, however, data curation needs to go beyond preparing data for individual applications. As the vast amounts of data continue to increase, the ability to automate the data curation process effectively at scale has become an increasingly critical factor for supporting modern, complex, cross-functional business initiatives.

This chapter explores the methods for automating and managing data curation. Let’s start by looking at what good data curation at scale looks like.

The Value of Curating Data

Effective large-scale data curation forms the cornerstone upon which robust data governance practices are built, enabling organizations to establish trust in their data to meet business needs. Achieving this entails implementing data integration ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Data Science on AWS

Data Science on AWS

Chris Fregly, Antje Barth
Serverless Development on AWS

Serverless Development on AWS

Sheen Brisals, Luke Hedger

Publisher Resources

ISBN: 9781098157562