Apache Spark 2.0 has become the gold standard for processing large datasets. This course, designed for learners with basic Python programming experience, takes you on an introductory journey into the world of big data analysis using Spark 2.0, Python, and the Spark DataFrame API.
Beginning with an overview of Spark 2.0 and Python, and then moving into a detailed examination of DataFrames, you'll learn about using SQL with DataFrames, DataFrame dates and timestamps, DataFrame aggregate operations, and about DataFrames and missing data. The course includes a hands-on data analysis exercise using real stock data. Learners should have Python and Spark installed on their computers before starting the class.