Big Data Processing with Hadoop, Spark, Snowflake and Databricks
Learn to process big data using popular platforms like Hadoop, Spark, Snowflake and Databricks through live coding examples
Learn from O'Reilly author Kennedy Behrman
This video series covers key concepts and tools for big data processing and storage. It introduces platforms like Hadoop, Spark, Snowflake and Databricks, discussing their architectures and use cases. Through live coding demonstrations in Python and SQL, you'll learn to work with these technologies hands-on.
Lessons Covered Include:
-
Hadoop ecosystem and MapReduce programming model
-
Spark architecture, Resilient Distributed Datasets (RDDs), and PySpark DataFrames
-
Snowflake's hybrid shared-disk/shared-nothing design and 3-layer architecture
-
Spark SQL module for structured data processing
-
PySpark examples of filtering, grouping, joining and transforming DataFrames
-
Snowflake account setup, warehouses, databases, schemas and access control
-
Using the Snowflake Python Connector to read data, run queries and write data
-
Key differences between Hadoop, Spark, Snowflake and Databricks
-
Spark concepts like drivers, executors, jobs, stages, partitions and lazy evaluation
-
Snowflake virtual warehouses, scaling, auto-suspend and auto-resume
Learning Objectives
-
Understand the core concepts behind popular big data platforms and how they differ
-
Gain hands-on experience using PySpark and Snowflake to process and analyze data
-
Learn to create RDDs and DataFrames in PySpark and perform common data manipulations
-
Practice architecting Snowflake virtual warehouses and managing access control
-
Discover how to leverage the Snowflake Python Connector for data interactions
-
Build an intuition for when to use different big data tools for specific use cases
Additional Popular Resources