Analytics Engineering with SQL and DBT

Book description

With the shift from data warehouses to data lakes, data now lands in repositories before it's been transformed, enabling engineers to model raw data into clean, well-defined datasets. DBT (data build tool) helps you take data further. This practical book shows data analysts, data engineers, BI developers, and data scientists how to create a true self-service transformation platform through the use of dynamic SQL.

Authors Rui Machado from Monstarlab and Helder Russa from Jumia show you how to quickly deliver new data products by focusing more on value delivery and less on architectural and engineering aspects. If you know your business well and have the technical skills to model raw data into clean, well-defined datasets, you'll learn how to design and deliver data models without any technical influence.

With this book, you'll learn:

  • What DBT is and how a DBT project is structured
  • How DBT fits into the data engineering and analytics worlds
  • How to collaborate on building data models
  • The main tools and architectures for building useful, functional data models
  • How to fit DBT into data warehousing and laking architecture
  • How to build tests for data transformations

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why we wrote this book
    2. Who this book is for
    3. How this book is organized
    4. Conventions Used in This Book
    5. Using Code Examples
    6. O’Reilly Online Learning
    7. How to Contact Us
    8. Acknowledgments
  2. 1. Analytics Engineering
    1. Databases and their impact on Analytics Engineering
    2. Cloud Computing and its impact on Analytics Engineering
    3. The data analytics lifecycle
    4. The new role of analytics engineer
      1. Responsibilities of an Analytics Engineer
    5. Enabling analytics in a data mesh
    6. dbt as a data mesh enabler
    7. The heart of analytics engineering
    8. The legacy way
      1. Using SQL and Procedures for ETL/ELT
      2. Using ETL tools
      3. The dbt revolution
    9. Summary
  3. 2. Data Modelling for Analytics
    1. A brief on data modeling
      1. The conceptual phase of modeling
      2. The logical phase of modeling
      3. The physical phase of modeling
      4. The data normalization process
    2. Dimensional Data Modelling
      1. Modelling with Star Schema
      2. Modelling with Snowflake
      3. Modelling with Data Vault
    3. Monolith data modeling
    4. Building modular data models
      1. Enabling modular data Models with dbt
      2. Testing your data models
      3. Data Documentation
      4. Data model optimization and debugging
    5. Meddalion Architecture Pattern
    6. Summary
  4. 3. SQL for Analytics
    1. The resiliency of SQL
    2. Databases Fundamentals
      1. Types of databases
      2. Database Management System
      3. “Speaking” with a database
    3. Creating and managing your data structures with DDL
    4. Manipulating data with DML
      1. Inserting data with the INSERT
      2. Selecting data with SELECT
      3. Updating data with UPDATE
      4. Deleting data with DELETE
    5. Storing queries as Views
    6. Common Table Expressions - CTE
    7. Window functions
    8. SQL for Distributed Data processing
      1. Data Manipulation with DuckDB
      2. Data Manipulation with Polars
      3. Data Manipulation with FugueSQL
    9. Bonus Training Machine Learning Models with SQL
    10. Summary
  5. 4. Data Transformation with dbt
    1. dbt Design Philosophy
    2. dbt data flow
    3. dbt Cloud
      1. Setting up dbt Cloud with BigQuery and GitHub
      2. dbt Cloud UI
      3. dbt Cloud IDE
    4. Structure of a dbt Project
      1. Jaffle Shop Database
      2. YAML files
      3. Models
      4. Sources
      5. Tests
      6. Analyses
      7. Seeds
      8. Documentation
      9. dbt Commands and Selection Syntax
      10. Jobs and Deploy
    5. Summary
  6. 5. dbt Advanced topics
    1. Models materializations
      1. Tables, views, and ephemeral models
      2. Incremental models
      3. Materialized View
      4. Snapshots
    2. Dynamic SQL with Jinja
    3. Using SQL Macros
    4. dbt Packages
      1. Installing packages
      2. dbt_utils package
      3. Using packages inside macros and models
    5. dbt Semantic Layer
    6. Summary
  7. 6. Building an end-to-end analytics engineering use case
    1. Problem Definition - An Omnichannel Analytics Case
    2. Operational Data Modeling
      1. Conceptual Model
      2. Logical Model
      3. Physical Model
    3. High level Data Architecture
      1. Extracting and loading to staging in Google BigQuery
    4. Analytical Data Modelling
      1. Identify the Business Processes
      2. Identify Facts and Dimensions in Dimensional Data Model
      3. Identify the Attributes for Dimensions
      4. Define the Granularity for Business Facts
    5. Creating our Datawarehouse with dbt
    6. Tests, Documentation, and Deployment with dbt
    7. Data Analytics with SQL
    8. Conclusion
  8. About the Authors

Product information

  • Title: Analytics Engineering with SQL and DBT
  • Author(s): Rui Pedro Machado, Helder Russa
  • Release date: December 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098142384