Skip to Content
View all events

Databricks Data Engineer Associate Certification Prep in 2 Weeks

Published by O'Reilly Media, Inc.

Beginner to intermediate content levelBeginner to intermediate

Course outcomes

  • Understand how to use the Databricks Intelligence Platform and its tools
  • Learn how to build ETL pipelines and process data incrementally
  • Discover how to put data pipelines into production
  • Understand and follow best security practices in Databricks

Course description

The Databricks Intelligence Platform is a modern data platform that combines the best aspects of data lakes and data warehouses. The Databricks Data Engineer Associate certification is proof that you have a complete understanding of the platform and its capabilities and the right skills to complete essential data engineering tasks on the platform.

Join expert Derar Alhussein to build a strong foundation in all topics covered on the certification exam, including the Databricks Intelligence Platform and its tools and benefits. You’ll learn to build ETL pipelines using Apache Spark SQL and Python in both batch and streaming modes, and you’ll discover how to orchestrate production pipelines and design dashboards while maintaining entity permissions.

NOTE: With today’s registration, you’ll be signed up for all four sessions. Although you can attend any of the sessions individually, we recommend participating in all.

What you’ll learn and how you can apply it

  • Use the Databricks Intelligence Platform and its tools
  • Ingest data from diverse sources using Lakeflow Connect
  • Build ETL pipelines using Lakeflow Declarative Pipelines
  • Deploy production workloads using Lakeflow Jobs, and Databricks Asset Bundles.
  • Manage data governance and security using Unity Catalog

This live event is for you because...

  • You want to become a Databricks Certified Data Engineer Associate.
  • You’re new to Databricks and want to save time by learning Databricks fundamentals.
  • You’re a data engineer who wants to apply your skills to Databricks.

Prerequisites

  • Have or create a cloud account on Azure, AWS, or GCP (without an account, you’ll use a limited Community Edition of Databricks)
  • Basic SQL knowledge
  • Python programming experience
  • Familiarity with cloud fundamentals

Recommended preparation:

  • Bookmark the course GitHub repository (instructions for cloning the repo in your Databricks workspace will be given in the course)

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

### Day 1: Databricks Intelligence Platform

  • Introduction to Databricks (20 minutes)
  • Presentation: Course Overview, and What is Databricks Intelligence Platform?
  • Quiz: a knowledge check question
  • Q&A

Setting up Databricks workspace (40 minutes)

  • Presentation: Get started with Databricks Free Edition, and Create Free trial on Azure
  • Hands-on Lab: Create your Databricks workspace
  • Q&A
  • 10 minutes break

Exploring Databricks Workspace (20 minutes)

  • Presentation: Navigating workspace, How to import course materials
  • Hands-on Lab: Import course materials from Github into your workspace
  • Q&A

Working with Notebooks (40 minutes)

  • Presentation: Creating Cluster, Notebooks Fundamentals, Notebooks Debugging
  • Hands-on Lab: Create a cluster and run a notebook
  • Q&A
  • 10 minutes break

Delta Lake (60 minutes)

  • Presentation: What is Delta Lake, working with Delta Lake Tables
  • Hands-on Lab: Create Delta Lake Tables
  • Q&A
  • 10 minutes break

Advanced Delta Lake Features (60 minutes)

  • Presentation: Time travel, Compacting Small Files and Z-order Indexing, Liquid Clustering, Vacuum
  • Quiz: a knowledge check question
  • Q&A

Day 2: Data Processing & Transformations

Relational Entities in Unity Catalog (80 minutes)

  • Presentation: Relational entities, Working with databases and tables on Databricks, Setting up tables, Working with Views
  • Hands-on Lab: Create and query relational entities
  • Q&A
  • 10 minutes break

Processing Data Files (80 minutes)

  • Presentation: Querying data files, Writing data files to tables
  • Hands-on Lab: Processing data files with Spark SQL
  • Discussion: a knowledge check question
  • Q&A
  • 10 minutes break

Advanced Data Processing (80 minutes)

  • Presentation: Advanced Transformations, SQL user-defined function (UDF)
  • Hands-on Lab: Applying advanced transformations
  • Quiz: a knowledge check question
  • Q&A

### Day 3: Developing and Productionizing Data Pipelines

Databricks LakeFlow (60 minutes)

  • Presentation: Lakeflow Connect, Auto Loader
  • Presentation: Data ingestion with Auto Loader
  • Q&A
  • 10 minutes break

Lakeflow Declarative Pipelines (60 minutes)

  • Presentation: Multi-hop architecture
  • Hands-on Lab: Create and run a ETL pipeline in SQL and Python
  • Q&A
  • 10 minutes break

Lakeflow Jobs (60 minutes)

  • Presentation: Task orchestration with Lakeflow Jobs
  • Hands-on Lab: Create, run, and debug Lakeflow Jobs
  • Quiz: a knowledge check question
  • Q&A
  • 10 minutes break

Databricks Asset Bundles (60 minutes)

  • Presentation: Job deployment, local development
  • Hands-on Lab: Setting up and using Databricks Asset Bundles, and Databricks Connect
  • Quiz: a knowledge check question
  • Q&A
  • 10 minutes break

Day 4: Data Governance & Quality

Databricks SQL (60 minutes)

  • Hands-on Lab: Data warehousing with DBSQL
  • Quiz: a knowledge check question
  • Q&A
  • 10 minutes break

Data Governance (60 minutes)

  • Presentation: Data Objects Privileges, and Managing Permissions
  • Hands-on Lab: Applying data objects privileges in Unity Catalog
  • Quiz: a knowledge check question
  • Q&A
  • 10 minutes break

Sharing Data (60 minutes)

  • Presentation: Delta Sharing, and Lakehouse Federation
  • Hands-on Lab: Sharing Data with Delta Sharing
  • Q&A
  • 10 minutes break

Optimization recommendations (40 minutes)

  • Presentation: Best practices and recommendations for Databricks compute
  • Hands-on Lab: Cluster configuration
  • Quiz: a knowledge check question
  • Q&A

Certification Overview (20 minutes)

  • Presentation: Certification Overview
  • Q&A

Your Instructor

  • Derar Alhussein

    Derar Alhussein is a senior data engineer with a master's degree in data mining and the author of the O’Reilly book Databricks Certified Data Engineer Associate Study Guide. He has over a decade of hands-on experience in software and data projects and currently holds eight certifications from Databricks. Derar is also an experienced instructor who has trained thousands of data engineers, helping them develop their skills and obtain professional certifications. Databricks has recognized Derar as a Databricks MVP for his strong technical skills and ongoing contributions to the data and AI community.

Skill covered

Databricks Data Engineer Associate