book

Getting Data Right

Name: Getting Data Right
Author: Shannon Cutt
ISBN: 9781491935316

by Shannon Cutt

September 2015

Beginner to intermediate

52 pages

1h 51m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introduction
1. The Solution: Data Curation at Scale
Three Generations of Data Integration SystemsFive Tenets for SuccessTenet 1: Data Curation Is Never DoneTenet 2: A PhD in AI Can’t be a Requirement for SuccessTenet 3: Fully Automatic Data Curation Is Not Likely to Be SuccessfulTenet 4: Data Curation Must Fit into the Enterprise EcosystemTenet 5: A Scheme for “Finding” Data Sources Must Be Present
2. An Alternative Approach to Data Management
Centralized Planning ApproachesCommon InformationInformation ChaosWhat Is to Be Done?Take a Federal Approach to Data ManagementUse All the New Tools at Your DisposalDon’t Model, CatalogCataloging ToolsKeep Everything Simple and StraightforwardUse an Ecological Approach
3. Pragmatic Challenges in Building Data Cleaning Systems
Data Cleaning Challenges1. Scale2. Human in the Loop3. Expressing and Discovering Quality Constraints4. Heterogeneity and Interaction of Quality Rules5. Data and Constraints Decoupling and Interplay6. Data Variety7. Iterative by Nature, Not DesignBuilding Adoptable Data Cleaning Solutions
4. Understanding Data Science: An Emerging Discipline for Data-Intensive Discovery
Data Science: A New Discovery Paradigm That Will Transform Our WorldSignificance of DIA and Data ScienceIllustrious Histories: The Origins of Data ScienceWhat Could Possibly Go Wrong?Do We Understand Data Science?Cornerstone of a New Discovery ParadigmData Science: A PerspectiveUnderstanding Data Science from PracticeMethodology to Better Understand DIADIA ProcessesCharacteristics of Large-Scale DIA Use CasesLooking Into a Use CaseResearch for an Emerging DisciplineAcknowledgment
5. From DevOps to DataOps
Why It’s Time to Embrace “DataOps” as a New DisciplineFrom DevOps to DataOpsDefining DataOpsChanging the Fundamental InfrastructureDataOps MethodologyIntegrating DataOps into Your OrganizationThe Four Processes of DataOpsData EngineeringData IntegrationData QualityData SecurityBetter Information, Analytics, and Decisions
6. Data Unification Brings Out the Best in Installed Data Management Strategies
Positioning ETL and MDMExtract, Transform, and LoadMaster Data ManagementClustering to Meet the Rising Data TideEmbracing Data Variety with Data UnificationData Unification Is AdditiveData Unification and Master Data ManagementData Unification and ETLChanging InfrastructureProbabilistic Approach to Data Unification

Content preview from Getting Data Right

Introduction

Jerry Held

Companies have invested an estimated $3–4 trillion in IT over the last 20-plus years, most of it directed at developing and deploying single-vendor applications to automate and optimize key business processes. And what has been the result of all of this disparate activity? Data silos, schema proliferation, and radical data heterogeneity.

With companies now investing heavily in big data analytics, this entropy is making the job considerably more complex. This complexity is best seen when companies attempt to ask “simple” questions of data that is spread across many business silos (divisions, geographies, or functions). Questions as simple as “Are we getting the best price for everything we buy?” often go unanswered because on their own, top-down, deterministic data unification approaches aren’t prepared to scale to the variety of hundreds, thousands, or tens of thousands of data silos.

The diversity and mutability of enterprise data and semantics should lead CDOs to explore—as a complement to deterministic systems—a new bottom-up, probabilistic approach that connects data across the organization and exploits big data variety. In managing data, we should look for solutions that find siloed data and connect it into a unified view. “Getting Data Right” means embracing variety and transforming it from a roadblock into ROI. Throughout this report, you’ll learn how to question conventional assumptions, and explore alternative approaches to managing big data in ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Securing SQL Server: DBAs Defending the Database

Publisher Resources

ISBN: 9781491935361Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Getting Data Right

by Shannon Cutt

Introduction

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.