Video description
Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala. Recently updated with nearly an hour of new footage on DataFrames in Spark 1.3, this video workshop shows you how to explore data and apply algorithms with MLlib, GraphX, and Spark SQL. You’ll learn Spark and its core APIs by doing hands-on technical exercises with presenter Paco Nathan, host of the popular Just Enough Math video workshop.
With this workshop, you will:
- Get going with the newest features of Spark 1.3
- Open a Spark shell
- Develop Spark apps for typical use cases
- Use some machine-learning algorithms
- Explore data sets loaded from HDFS or another filesystem
- Work with Spark SQL, Spark Streaming, and Spark’s machine-learning library, MLlib
- Use Maven, SBT, IPython Notebook, and other tooling
- Learn about Spark follow-up courses and certification
Paco Nathan has led innovative data teams building large-scale apps for several years. He’s an expert in distributed systems, machine learning, cloud computing, and functional programming.
Publisher resources
Table of contents
- Pre-Flight Check
- Spark Deconstructed
- A Brief History
- Simple Spark Apps
- Spark Essentials
- Spark Examples
- Unifying the Pieces - Spark SQL
- Unifying the Pieces - Spark Streaming
- Unifying the Pieces - MLlib and GraphX
- Unified Workflows Demo
- The Full SDLC
- Developer Certification
- Resources
- Introduction - Why DataFrames?
- ETL to Prepare the Data from Capital Bikeshare
- Create a DataFrame, Explore using SQL
- Data Preparation for Machine Learning Models
- Build a Classifier Using Naive Bayes
- Build a Classifier Using Decision Trees
- Build a Classifier Using Random Forests
- Use a DataFrame to Compare Models
- Parquet as a Best Practice with DataFrames
- How to Store a DataFrame with Parquet
- How to Read a DataFrame Back in From Parquet
- Use SQL to Estimate Route Durations
- Data Preparation for GraphX - Model Route Costs
- Use PageRank to Rank Popular Stations
- Optimize Routes to Columbus Circle
- Compare Results with Google Maps
- Analyze a Popular Tourist Route
- Examples of How to Use DataFrames in Python
- Summary - The New DataFrames Features in Spark
Product information
- Title: Introduction to Apache Spark
- Author(s):
- Release date: March 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491919729
You might also like
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
audiobook
Fall in Love with the Problem, Not the Solution
Unicorns-companies that reach a valuation of more than $1 billion-are rare. Uri Levine has built two. …
video
Java Concurrency and Multithreading in Practice
Improve the performance of your application by using modern Java's multithreading features About This Video Increase …
audiobook
Software Architecture for Busy Developers
A quick start guide to learning essential software architecture tools, frameworks, design patterns, and best practices …