Overview
In this 8-hour course, you will explore the fundamentals of Apache Spark 3 using Python for data engineering and analytics. From learning the essentials of PySpark to applying it in Databricks for creating powerful data solutions, this course guides you towards mastering data processing and analysis at scale.
What I will be able to do after this course
- Understand and utilize Spark's structured and RDD APIs for data transformations and actions.
- Set up and configure your own local PySpark environment for effective Spark development.
- Grasp concepts of Spark execution and its Directed Acyclic Graphs (DAG).
- Learn to use Spark SQL and DataFrames for data manipulation and queries.
- Develop dashboards and visualizations on Databricks for insightful analytics.
Course Instructor(s)
David Mngadi is an experienced data engineer and instructor, proficient in working with technologies like Python and Apache Spark. With a passion for teaching, he constructs engaging, hands-on courses that empower learners to achieve confidence in data analysis. David's approachable style ensures principles are understood clearly and practically.
Who is it for?
This course is ideal for Python developers seeking to branch into data engineering and analytics with PySpark. Aspiring data professionals and analysts with foundational programming knowledge will benefit greatly. Data scientists looking to scale their analysis for big data applications are also welcome. Enthusiasts in engineering tasks over distributed systems will find it engaging and rewarding.