video

Data Platforms: Spark to Snowflake

by Kennedy Behrman

May 2024

Beginner to intermediate

54m

English

Pragmatic AI Labs

Overview

Big Data Processing with Hadoop, Spark, Snowflake and Databricks

Learn to process big data using popular platforms like Hadoop, Spark, Snowflake and Databricks through live coding examples Learn from O'Reilly author Kennedy Behrman

This video series covers key concepts and tools for big data processing and storage. It introduces platforms like Hadoop, Spark, Snowflake and Databricks, discussing their architectures and use cases. Through live coding demonstrations in Python and SQL, you'll learn to work with these technologies hands-on.

Lessons Covered Include:

Hadoop ecosystem and MapReduce programming model
Spark architecture, Resilient Distributed Datasets (RDDs), and PySpark DataFrames
Snowflake's hybrid shared-disk/shared-nothing design and 3-layer architecture
Spark SQL module for structured data processing
PySpark examples of filtering, grouping, joining and transforming DataFrames
Snowflake account setup, warehouses, databases, schemas and access control
Using the Snowflake Python Connector to read data, run queries and write data
Key differences between Hadoop, Spark, Snowflake and Databricks
Spark concepts like drivers, executors, jobs, stages, partitions and lazy evaluation
Snowflake virtual warehouses, scaling, auto-suspend and auto-resume

Learning Objectives

Understand the core concepts behind popular big data platforms and how they differ
Gain hands-on experience using PySpark and Snowflake to process and analyze data
Learn to create RDDs and DataFrames in PySpark and perform common data manipulations
Practice architecting Snowflake virtual warehouses and managing access control
Discover how to leverage the Snowflake Python Connector for data interactions
Build an intuition for when to use different big data tools for specific use cases

Additional Popular Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnb

BlueOrigin

Electronic Arts

HomeDepot

Nasdaq

Rakuten

Tata Consultancy Services

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

You might also like

Designing Cloud Data Platforms

Designing Cloud Data Platforms

Lynda Partner, Danil Zburivsky

Architecting Data and Machine Learning Platforms

Architecting Data and Machine Learning Platforms

Marco Tranquillin, Valliappa Lakshmanan, Firat Tekiner

Data Modeling with Snowflake

Data Modeling with Snowflake

Serge Gershkovich

Snowflake Data Engineering

Snowflake Data Engineering

Maja Ferle

Publisher Resources

ISBN: 050522024VIDEOPAIMLOther Publisher Website