Skip to Content
Data Platforms: Spark to Snowflake
video

Data Platforms: Spark to Snowflake

by Kennedy Behrman
May 2024
Beginner to intermediate
54m
English
Pragmatic AI Labs

Overview

Big Data Processing with Hadoop, Spark, Snowflake and Databricks

Learn to process big data using popular platforms like Hadoop, Spark, Snowflake and Databricks through live coding examples Learn from O'Reilly author Kennedy Behrman

This video series covers key concepts and tools for big data processing and storage. It introduces platforms like Hadoop, Spark, Snowflake and Databricks, discussing their architectures and use cases. Through live coding demonstrations in Python and SQL, you'll learn to work with these technologies hands-on.

Lessons Covered Include:

  • Hadoop ecosystem and MapReduce programming model
  • Spark architecture, Resilient Distributed Datasets (RDDs), and PySpark DataFrames
  • Snowflake's hybrid shared-disk/shared-nothing design and 3-layer architecture
  • Spark SQL module for structured data processing
  • PySpark examples of filtering, grouping, joining and transforming DataFrames
  • Snowflake account setup, warehouses, databases, schemas and access control
  • Using the Snowflake Python Connector to read data, run queries and write data
  • Key differences between Hadoop, Spark, Snowflake and Databricks
  • Spark concepts like drivers, executors, jobs, stages, partitions and lazy evaluation
  • Snowflake virtual warehouses, scaling, auto-suspend and auto-resume
Learning Objectives
  • Understand the core concepts behind popular big data platforms and how they differ
  • Gain hands-on experience using PySpark and Snowflake to process and analyze data
  • Learn to create RDDs and DataFrames in PySpark and perform common data manipulations
  • Practice architecting Snowflake virtual warehouses and managing access control
  • Discover how to leverage the Snowflake Python Connector for data interactions
  • Build an intuition for when to use different big data tools for specific use cases
Additional Popular Resources
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Designing Cloud Data Platforms

Designing Cloud Data Platforms

Lynda Partner, Danil Zburivsky
Architecting Data and Machine Learning Platforms

Architecting Data and Machine Learning Platforms

Marco Tranquillin, Valliappa Lakshmanan, Firat Tekiner

Publisher Resources

ISBN: 050522024VIDEOPAIMLOtherPublisher Website