book

Data Engineering for Beginners

Name: Data Engineering for Beginners
Author: Chisom Nwokwu
ISBN: 9781394325412

by Chisom Nwokwu

November 2025

Beginner

384 pages

9h 45m

English

Wiley

Read now

Unlock full access

Cover
Table of Contents
Title Page
Foreword
Introduction
What Does This Book Cover?Who Should Read This Book?
CHAPTER 1: Understanding Data
A Brief History of DataTypes of DataWhy Is Data Important?Data and InformationSummaryNotes
CHAPTER 2: Introduction to Data Engineering
Data Engineering Explained Using an Oil Refinery AnalogyAn Overview of the Data Engineering Life CycleNavigating Project Requirements, Engaging Stakeholders, and Delivering Business ValueThe Current State of Data EngineeringThe Importance of Data EngineeringSummary
CHAPTER 3: Database Fundamentals
Key Concepts of DatabasesTypes of DatabasesChoosing Between Relational and NoSQL DatabasesSummary
CHAPTER 4: SQL Fundamentals
Introduction to SQLComparison OperatorsUnderstanding JoinsLab: Setting Up SQL Server and Running SQL QueriesBest Practices for Writing Efficient SQL QueriesSummary
CHAPTER 5: Database Design
Data ModelingNormalizationDenormalizationData Modeling Best PracticesDatabase OptimizationSummary

CHAPTER 6: Data Warehouses, Data Lakes, and Data Lakehouses
Data WarehousesData MartsData LakesData LakehouseThe Key Differences Between a Database, Data Warehouse, Data Lake, and Data LakehouseSummary
CHAPTER 7: Data Pipelines
Batch PipelinesStream PipelinesLambda ArchitectureData OrchestrationLab: Building an ETL Pipeline and Automating with Apache AirflowSummary
CHAPTER 8: Data Quality
Bad DataDimensions of Data QualityData Quality HierarchySummary
CHAPTER 9: Data Security
What Is Data Security?Common Threats to Data SecurityCore Principles of Data SecurityData EncryptionData MaskingUnderstanding Network SecurityAccess ControlSecrets ManagementData Security and Data PrivacySummary
CHAPTER 10: Data Governance
How to Think About Data GovernanceData Governance FrameworkPoliciesProcessesRoles in the Data Governance FrameworkData Management and Data GovernanceSummary
CHAPTER 11: Big Data and Distributed Systems
The Five V’s of Big DataDistributed SystemsDistributed Data ProcessingBig Data File TypesSummary
CHAPTER 12: Data Engineering on the Cloud
Cloud ComputingCore Cloud ConceptsCloud Service ModelsCloud Management ModelsCost OptimizationSummary
CHAPTER 13: Building a Career in Data Engineering
Types of Data Engineering RolesTypes of Data EngineersLanding Your First Data Engineering RoleThinking Like a Data EngineerSummary
APPENDIX: Sample Interview Questions
SQLData ModelingData PipelinesApache SparkSystem Design
Data Engineering Glossary
Index
Copyright
Dedication
Acknowledgments
About the Author
About the Technical Editor
End User License Agreement

Content preview from Data Engineering for Beginners

CHAPTER 12Data Engineering on the Cloud

In the early days of computing, computers were very large and expensive. Organizations that could afford them had mainframes housed in dedicated rooms where the temperature was controlled, and users interacted with them through terminals. With this setup, every single byte of processing or storage was managed internally. As personal computers and servers became more affordable, many companies transitioned to building their on-premises (in-house) infrastructure. This meant buying physical servers, installing them in racks, and having a dedicated IT team to manage everything from hardware maintenance to software updates.

However, this setup had several limitations. First, it required heavy capital investment because companies had to predict their future computing needs, which, to be honest, often changed. If they underestimated, they couldn’t handle sudden spikes in traffic. If they overestimated, they wasted money on hardware that won’t be used. Second, maintaining on-premise systems was complex. IT teams had to worry about cooling, power supply, backups, disaster recovery, hardware failures, and security, all while trying to support the business’s evolving data needs.

Later on, a game-changing idea began to emerge along with questions like, “What if companies could access computing power over the Internet?,” without having to buy and maintain the hardware themselves. This idea wasn’t entirely new, because something similar was happening ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781394325412

Cloud Computing