book

Data Engineering for Beginners

Name: Data Engineering for Beginners
Author: Chisom Nwokwu
ISBN: 9781394325412

by Chisom Nwokwu

November 2025

Beginner

384 pages

9h 45m

English

Wiley

Read now

Unlock full access

Cover
Table of Contents
Title Page
Foreword
Introduction
What Does This Book Cover?Who Should Read This Book?
CHAPTER 1: Understanding Data
A Brief History of DataTypes of DataWhy Is Data Important?Data and InformationSummaryNotes
CHAPTER 2: Introduction to Data Engineering
Data Engineering Explained Using an Oil Refinery AnalogyAn Overview of the Data Engineering Life CycleNavigating Project Requirements, Engaging Stakeholders, and Delivering Business ValueThe Current State of Data EngineeringThe Importance of Data EngineeringSummary
CHAPTER 3: Database Fundamentals
Key Concepts of DatabasesTypes of DatabasesChoosing Between Relational and NoSQL DatabasesSummary
CHAPTER 4: SQL Fundamentals
Introduction to SQLComparison OperatorsUnderstanding JoinsLab: Setting Up SQL Server and Running SQL QueriesBest Practices for Writing Efficient SQL QueriesSummary
CHAPTER 5: Database Design
Data ModelingNormalizationDenormalizationData Modeling Best PracticesDatabase OptimizationSummary

CHAPTER 6: Data Warehouses, Data Lakes, and Data Lakehouses
Data WarehousesData MartsData LakesData LakehouseThe Key Differences Between a Database, Data Warehouse, Data Lake, and Data LakehouseSummary
CHAPTER 7: Data Pipelines
Batch PipelinesStream PipelinesLambda ArchitectureData OrchestrationLab: Building an ETL Pipeline and Automating with Apache AirflowSummary
CHAPTER 8: Data Quality
Bad DataDimensions of Data QualityData Quality HierarchySummary
CHAPTER 9: Data Security
What Is Data Security?Common Threats to Data SecurityCore Principles of Data SecurityData EncryptionData MaskingUnderstanding Network SecurityAccess ControlSecrets ManagementData Security and Data PrivacySummary
CHAPTER 10: Data Governance
How to Think About Data GovernanceData Governance FrameworkPoliciesProcessesRoles in the Data Governance FrameworkData Management and Data GovernanceSummary
CHAPTER 11: Big Data and Distributed Systems
The Five V’s of Big DataDistributed SystemsDistributed Data ProcessingBig Data File TypesSummary
CHAPTER 12: Data Engineering on the Cloud
Cloud ComputingCore Cloud ConceptsCloud Service ModelsCloud Management ModelsCost OptimizationSummary
CHAPTER 13: Building a Career in Data Engineering
Types of Data Engineering RolesTypes of Data EngineersLanding Your First Data Engineering RoleThinking Like a Data EngineerSummary
APPENDIX: Sample Interview Questions
SQLData ModelingData PipelinesApache SparkSystem Design
Data Engineering Glossary
Index
Copyright
Dedication
Acknowledgments
About the Author
About the Technical Editor
End User License Agreement

Content preview from Data Engineering for Beginners

CHAPTER 9Data Security

In 2017, Uber Technologies, one of the world’s leading ride-hailing companies, disclosed a massive data breach that had occurred a year earlier in 2016. Hackers accessed Uber’s AWS cloud storage, stealing personal information of 57 million users and drivers globally. The stolen information included names, email addresses, phone numbers, and in some cases, trip details.

The attackers gained access by obtaining API credentials that had been published in a private GitHub repository. Using these keys, they accessed Uber’s cloud environment and downloaded the data. The breach went undetected for nearly a year. When it was discovered in late 2017, Uber chose not to disclose it publicly. Instead, the company paid the hackers $100,000 to delete the data, disguising the payment as part of a bug bounty program. The breach raised concerns about the broader state of cybersecurity in the tech industry.

What could they have done differently? The hackers gained access to Uber’s AWS systems because API keys were exposed. This shows that Uber might not have followed adequate security practices for protecting sensitive credentials, and it also suggests their access controls might have been too weak or set up incorrectly. Uber didn‘t detect the breach for over a year, which suggests that their monitoring systems were either insufficient or not appropriately configured, and there wasn’t any real-time monitoring setup to quickly detect unauthorized access.

The Uber data breach ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781394325412

Cloud Computing