Video description
Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala. Recently updated with nearly an hour of new footage on DataFrames in Spark 1.3, this video workshop shows you how to explore data and apply algorithms with MLlib, GraphX, and Spark SQL. You’ll learn Spark and its core APIs by doing hands-on technical exercises with presenter Paco Nathan, host of the popular Just Enough Math video workshop.
With this workshop, you will:
- Get going with the newest features of Spark 1.3
- Open a Spark shell
- Develop Spark apps for typical use cases
- Use some machine-learning algorithms
- Explore data sets loaded from HDFS or another filesystem
- Work with Spark SQL, Spark Streaming, and Spark’s machine-learning library, MLlib
- Use Maven, SBT, IPython Notebook, and other tooling
- Learn about Spark follow-up courses and certification
Paco Nathan has led innovative data teams building large-scale apps for several years. He’s an expert in distributed systems, machine learning, cloud computing, and functional programming.
Publisher resources
Table of contents
- Pre-Flight Check
- Spark Deconstructed
- A Brief History
- Simple Spark Apps
- Spark Essentials
- Spark Examples
- Unifying the Pieces - Spark SQL
- Unifying the Pieces - Spark Streaming
- Unifying the Pieces - MLlib and GraphX
- Unified Workflows Demo
- The Full SDLC
- Developer Certification
- Resources
- Introduction - Why DataFrames?
- ETL to Prepare the Data from Capital Bikeshare
- Create a DataFrame, Explore using SQL
- Data Preparation for Machine Learning Models
- Build a Classifier Using Naive Bayes
- Build a Classifier Using Decision Trees
- Build a Classifier Using Random Forests
- Use a DataFrame to Compare Models
- Parquet as a Best Practice with DataFrames
- How to Store a DataFrame with Parquet
- How to Read a DataFrame Back in From Parquet
- Use SQL to Estimate Route Durations
- Data Preparation for GraphX - Model Route Costs
- Use PageRank to Rank Popular Stations
- Optimize Routes to Columbus Circle
- Compare Results with Google Maps
- Analyze a Popular Tourist Route
- Examples of How to Use DataFrames in Python
- Summary - The New DataFrames Features in Spark
Product information
- Title: Introduction to Apache Spark
- Author(s):
- Release date: March 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491919729
You might also like
book
Options, Futures, and Other Derivatives, 10th Edition
For courses in business, economics, and financial engineering and mathematics. The definitive guide to derivatives markets, …
video
Apache Kafka Series - Learn Apache Kafka for Beginners v3
The high throughput and low latency of Apache Kafka have made it one of the leading …
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. …