Video description
Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala. Recently updated with nearly an hour of new footage on DataFrames in Spark 1.3, this video workshop shows you how to explore data and apply algorithms with MLlib, GraphX, and Spark SQL. You’ll learn Spark and its core APIs by doing hands-on technical exercises with presenter Paco Nathan, host of the popular Just Enough Math video workshop.
With this workshop, you will:
- Get going with the newest features of Spark 1.3
- Open a Spark shell
- Develop Spark apps for typical use cases
- Use some machine-learning algorithms
- Explore data sets loaded from HDFS or another filesystem
- Work with Spark SQL, Spark Streaming, and Spark’s machine-learning library, MLlib
- Use Maven, SBT, IPython Notebook, and other tooling
- Learn about Spark follow-up courses and certification
Paco Nathan has led innovative data teams building large-scale apps for several years. He’s an expert in distributed systems, machine learning, cloud computing, and functional programming.
Publisher resources
Table of contents
- Pre-Flight Check 00:13:08
- Spark Deconstructed 00:14:31
- A Brief History 00:23:28
- Simple Spark Apps 00:25:07
- Spark Essentials 00:35:18
- Spark Examples 00:21:55
- Unifying the Pieces - Spark SQL 00:24:07
- Unifying the Pieces - Spark Streaming 00:14:48
- Unifying the Pieces - MLlib and GraphX 00:20:00
- Unified Workflows Demo 00:22:35
- The Full SDLC 00:04:01
- Developer Certification 00:06:10
- Resources 00:04:44
- Introduction - Why DataFrames? 00:02:28
- ETL to Prepare the Data from Capital Bikeshare 00:02:46
- Create a DataFrame, Explore using SQL 00:02:47
- Data Preparation for Machine Learning Models 00:05:33
- Build a Classifier Using Naive Bayes 00:04:43
- Build a Classifier Using Decision Trees 00:02:26
- Build a Classifier Using Random Forests 00:02:20
- Use a DataFrame to Compare Models 00:04:15
- Parquet as a Best Practice with DataFrames 00:00:58
- How to Store a DataFrame with Parquet 00:03:25
- How to Read a DataFrame Back in From Parquet 00:02:57
- Use SQL to Estimate Route Durations 00:01:41
- Data Preparation for GraphX - Model Route Costs 00:04:43
- Use PageRank to Rank Popular Stations 00:03:14
- Optimize Routes to Columbus Circle 00:03:43
- Compare Results with Google Maps 00:01:58
- Analyze a Popular Tourist Route 00:02:30
- Examples of How to Use DataFrames in Python 00:02:57
- Summary - The New DataFrames Features in Spark 00:01:03
Product information
- Title: Introduction to Apache Spark
- Author(s):
- Release date: March 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491919729
You might also like
video
Scala for the Impatient
4+ Hours of Video Instruction Overview In Scala for the Impatient LiveLessons best-selling author and professor …
video
Python Fundamentals
51+ hours of video instruction. Overview The professional programmer’s Deitel® video guide to Python development with …
video
Building Spark Applications
13+ Hours of Video Instruction Overview Building Spark Applications LiveLessons provides data scientists and developers with …
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …