Snowflake for Data Engineering Bootcamp
Published by O'Reilly Media, Inc.
Transforming Data and Building Data Pipelines
- Gain a deeper understanding of data engineering features and principles in the Snowflake Data Cloud
- Learn to build complex and scalable data pipelines by using Snowflake unique features
- Load and transform structured and semistructured data in Snowflake
- Work with batch-oriented and continuous data
- Understand how to integrate Snowflake with other cloud-based tools
Join expert Tomas Sobotik to learn data engineering features, concepts, and best practices for the Snowflake Data Cloud platform. You’ll explore basic data ingestion principles and how to ingest different types of data into Snowflake covering batch and continuous data pipelines, as well as advanced topics including Snowpark, external functions, and SQL API. You’ll also learn how to monitor data pipelines, use stored procedures, create user-defined functions, and integrate Snowflake with other cloud-based tools.
NOTE: With today’s registration, you’ll be signed up for both sessions. Although you can attend either of the sessions individually, we recommend participating in all both.
What you’ll learn and how you can apply it
- Learn data loading strategies
- Understand data ingestion workflow and semistructured data ingestion
- Explore continuous data pipelines
- Learn how to build and monitor data pipelines
- Use SQL API and external functions
- Use Snowpark for data transformation
This live event is for you because...
- You’re an experienced data engineer who wants to understand Snowflake capabilities in data engineering and to integrate best practices into your workflow
- You’re a junior data engineer who wants to learn how to use Snowflake data cloud features
- You’re a data apps developer who needs to extend your data engineering skills
Prerequisites
- Knowledge of SQL and relational databases
- Knowledge of Snowflake platform
- For some exercises, knowledge of Python basics (useful but not required)
- Knowledge of basic cloud concepts and familiarity with one of the major cloud providers (we will use AWS for some of the exercises)
Course Setup:
- Signup for a Snowflake trial account (enterprise edition, before the course to have it available for both weeks)
- If possible, select AWS as a cloud provider for your Snowflake account
- Sign up for an AWS trial account or have access to an AWS account with admin privileges
- Follow instructions to prepare your local Python environment for running Snowpark API
- Create a GitHub repository which will be used for CI/CD exercise by using GitHub Actions
Recommended preparation:
- Take Snowflake Fundamentals Bootcamp (live online course with Tomas Sobotik)
Recommended follow-up:
- Read Snowflake: The Definitive Guide (book)
- Take SQL Fundamentals for Data (live online course with Thomas Nield)
- Explore Introducing SQL and Relational Databases (on-demand course)
- Take AWS Cloud Practitioner Bootcamp (live online course with Bill Boulden)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Week 1: Basic Data Ingestion Principles
Snowflake architecture (30 minutes)
- Presentation: Snowflake architecture overview
- Group discussion: Your general architecture knowledge
- Q&A
Intro to data loading strategies (30 minutes)
- Presentation: Batch versus continuous versus real-time processing
- Q&A
- Break
Data ingestion workflow (30 minutes)
- Presentation: Creating a data ingestion workflow in Snowflake
- Hands-on exercises: Create basic objects (stage, storage integration, etc.); ingest flat data into Snowflake
- Q&A
Semistructured data ingestion (35 minutes)
- Presentation: Challenges with semistructured files ingestion
- Hands-on exercises: Ingest semistructured data files; explore INFER_SCHEMA and GENERATE_COLUMN_DESCRIPTION functions
- Q&A
- Break
SnowSQL (30 minutes)
- Presentation: SnowSQL interface and how to use it
- Hands-on exercises: Install and configure SnowSQL client; use SnowSQL to manage files in internal stages
- Q&A
Views and their differences (25 minutes)
- Presentation: Different types of views in Snowflake
- Hands-on exercises: Create view and secure view; create materialized view
- Q&A
- Break
Continuous data pipelines (35 minutes)
- Presentation: Intro to continuous data pipelines
- Hands-on exercises: Configure Snowpipe; ingest data through Snowpipe
- Q&A
Complex data pipelines with streams and tasks (25 minutes)
- Presentation: Building data pipelines with Snowflake features
- Hands-on exercises: Create a stream; create task to transform data
- Q&A
- Break
Day 2: Data Transformation Features
Monitoring data pipelines (35 minutes)
- Presentation: How to monitor data pipelines
- Hands-on exercises: Use Snowflake metadata related to data pipelines run; create notification integration
- Q&A
Using stored procedures (25 minutes)
- Presentation: Intro to stored procedures
- Hands-on exercise: Create an SQL stored procedure
- Q&A
- Break
Using user-defined functions (20 minutes)
- Presentation: Intro to user-defined functions
- Hands-on exercise: Create an SQL user-defined function
- Q&A
External tables (20 minutes)
- Presentation: Intro to external tables
- Hands-on exercise: Create an external table and refresh it
- Q&A
Data unloading overview (25 minutes)
- Presentation: Summary of data unloading features
- Hands-on exercises: Unload csv data into internal stage; unload semistructured data into external stage (AWS S3)
- Q&A
- Break
Intro to SQL API (20 minutes)
- Presentation: Overview of SQL API
- Demonstration: How SQL API works
- Q&A
Kafka Connector overview (20 minutes)
- Presentation: Intro to Kafka integration with Snowflake
- Demonstration: Processing Kafka messages
- Q&A
External functions (30 minutes)
- Presentation: Working with external functions
- Hands-on exercises: Create AWS infrastructure for external function; create the API integration for AWS in Snowflake; create external function in Snowflake
- Q&A
- Break
Snowpark data transformation (25 minutes)
- Presentation: Using Snowpark for data transformation
- Demonstration: Basic transformation in Python worksheets
- Hands-on exercise: Explore data transformations in Snowpark for Python
- Q&A
Deploying UDF/stored procedure via Snowpark (20 minutes)
- Presentation: Using Snowpark for deploying UDF or stored procedures
- Demonstration: Deploying with SnowCLI
- Q&A
Your Instructor
Tomáš Sobotík
Tomas Sobotik is a senior data engineer and Snowflake subject matter expert at Norlys. He’s also a Snowflake Data Superhero and certified Snowflake expert. A technology enthusiast and passionate data developer, he has over 15 years of experience working on BI and data-related projects spanning various industries.