book

Data Engineering for Beginners

Name: Data Engineering for Beginners
Author: Chisom Nwokwu
ISBN: 9781394325412

by Chisom Nwokwu

November 2025

Beginner

384 pages

9h 45m

English

Wiley

Read now

Unlock full access

Cover
Table of Contents
Title Page
Foreword
Introduction
What Does This Book Cover?Who Should Read This Book?
CHAPTER 1: Understanding Data
A Brief History of DataTypes of DataWhy Is Data Important?Data and InformationSummaryNotes
CHAPTER 2: Introduction to Data Engineering
Data Engineering Explained Using an Oil Refinery AnalogyAn Overview of the Data Engineering Life CycleNavigating Project Requirements, Engaging Stakeholders, and Delivering Business ValueThe Current State of Data EngineeringThe Importance of Data EngineeringSummary
CHAPTER 3: Database Fundamentals
Key Concepts of DatabasesTypes of DatabasesChoosing Between Relational and NoSQL DatabasesSummary
CHAPTER 4: SQL Fundamentals
Introduction to SQLComparison OperatorsUnderstanding JoinsLab: Setting Up SQL Server and Running SQL QueriesBest Practices for Writing Efficient SQL QueriesSummary
CHAPTER 5: Database Design
Data ModelingNormalizationDenormalizationData Modeling Best PracticesDatabase OptimizationSummary

CHAPTER 6: Data Warehouses, Data Lakes, and Data Lakehouses
Data WarehousesData MartsData LakesData LakehouseThe Key Differences Between a Database, Data Warehouse, Data Lake, and Data LakehouseSummary
CHAPTER 7: Data Pipelines
Batch PipelinesStream PipelinesLambda ArchitectureData OrchestrationLab: Building an ETL Pipeline and Automating with Apache AirflowSummary
CHAPTER 8: Data Quality
Bad DataDimensions of Data QualityData Quality HierarchySummary
CHAPTER 9: Data Security
What Is Data Security?Common Threats to Data SecurityCore Principles of Data SecurityData EncryptionData MaskingUnderstanding Network SecurityAccess ControlSecrets ManagementData Security and Data PrivacySummary
CHAPTER 10: Data Governance
How to Think About Data GovernanceData Governance FrameworkPoliciesProcessesRoles in the Data Governance FrameworkData Management and Data GovernanceSummary
CHAPTER 11: Big Data and Distributed Systems
The Five V’s of Big DataDistributed SystemsDistributed Data ProcessingBig Data File TypesSummary
CHAPTER 12: Data Engineering on the Cloud
Cloud ComputingCore Cloud ConceptsCloud Service ModelsCloud Management ModelsCost OptimizationSummary
CHAPTER 13: Building a Career in Data Engineering
Types of Data Engineering RolesTypes of Data EngineersLanding Your First Data Engineering RoleThinking Like a Data EngineerSummary
APPENDIX: Sample Interview Questions
SQLData ModelingData PipelinesApache SparkSystem Design
Data Engineering Glossary
Index
Copyright
Dedication
Acknowledgments
About the Author
About the Technical Editor
End User License Agreement

Content preview from Data Engineering for Beginners

CHAPTER 7Data Pipelines

The data engineering life cycle consists of various stages, but how does data flow between these stages? It flows through data pipelines. Data pipelines power data systems and are a foundational component of data engineering. A data pipeline simply moves data between systems, applies transformations to make it analysis-ready, or does both concurrently.

It’s a structured system that controls the flow of data from one or more sources to one or more destinations. Along the way, it can perform operations like cleaning, filtering, joining, or aggregating data. Data pipelines can run in real time or in batches, depending on the needs of the business, and it usually follows a series of stages: collect, ingest, process, store, and serve.

IN THIS CHAPTER, YOU WILL LEARN THE FOLLOWING:

Popular ingestion methods in data engineering
How batch and streaming pipelines work
Publish and subscribe patterns in message queues
Windowing in stream processing
The Lambda architecture
Data orchestration and its key components
Scheduling and automation in data pipelines
Best practices for designing directed acyclic graphs (DAGs)
How to build an ETL pipeline and automate with Apache Airflow

At the end of this chapter, you will have a good understanding of various types of data pipeline architectures, their use cases, and the techniques needed to design, build, and manage them effectively.

Batch Pipelines

Imagine a fintech (financial-technology) company that offers loan services. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781394325412

Cloud Computing