Video description
In Video Editions the narrator reads the book while the content, figures, code listings, diagrams, and text appear on the screen. Like an audiobook that you can also watch as a video.
"Dig in and get your hands dirty with one of the hottest data processing engines today. A great guide."
Jonathan Sharley, Pandora Media
Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code.
Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades.
Inside:
- Updated for Spark 2.0
- Real-life case studies
- Spark DevOps with Docker
- Examples in Scala, and online in Java and Python
Petar Zečević and Marko Bonaći are seasoned developers heavily involved in the Spark community.
Must-have! Speed up your learning of Spark as a distributed computing framework.
Robert Ormandi, Yahoo!
An easy-to-follow, step-by-step guide.
Gaurav Bhardwaj, 3Pillar Global
An ambitiously comprehensive overview of Spark and its diverse ecosystem.
Jonathan Miller, Optensity
NARRATED BY KYLE JACKSON AND MARK THOMAS
Table of contents
-
PART 1: FIRST STEPS
- Chapter 1. Introduction to Apache Spark
- Chapter 1. What Spark brings to the table
- Chapter 1. Spark components
- Chapter 1. Spark program flow
- Chapter 1. Setting up the spark-in-action VM
- Chapter 2. Spark fundamentals
- Chapter 2. Using the VM’s Hadoop installation
- Chapter 2. Using Spark shell and writing your first Spark program
- Chapter 2. Basic RDD actions and transformations
- Chapter 2. Using the distinct and flatMap transformations
- Chapter 2. Obtaining RDD’s elements with the sample, take, and takeSample operations
- Chapter 2. Double RDD functions
- Chapter 3. Writing Spark applications
- Chapter 3. Developing the application
- Chapter 3. Running the application from Eclipse
- Chapter 3. Broadcast variables
- Chapter 3. Submitting the application
- Chapter 3. Using spark-submit
- Chapter 4. The Spark API in depth
- Chapter 4. Basic pair RDD functions
- Chapter 4. Using the flatMapValues transformation to add values to keys
- Chapter 4. Understanding data partitioning and reducing data shuffling
- Chapter 4. Understanding and avoiding unnecessary shuffling
- Chapter 4. Repartitioning RDDs
- Chapter 4. Joining, sorting, and grouping data
- Chapter 4. Joining data
- Chapter 4. Sorting data
- Chapter 4. Grouping data
- Chapter 4. Understanding RDD dependencies
- Chapter 4. Using accumulators and broadcast variables to communicate with Spark executors
- Chapter 4. Sending data to executors using broadcast variables
-
PART 2: MEET THE SPARK FAMILY
- Chapter 5. Sparkling queries with Spark SQL
- Chapter 5. Creating DataFrames from RDDs
- Chapter 5. Creating a DataFrame from an RDD of tuples
- Chapter 5. DataFrame API basics
- Chapter 5. Using SQL functions to perform calculations on data
- Chapter 5. Working with missing values
- Chapter 5. Grouping and joining data
- Chapter 5. Beyond DataFrames: introducing DataSets
- Chapter 5. Table catalog and Hive metastore
- Chapter 5. Executing SQL queries
- Chapter 5. Saving and loading DataFrame data
- Chapter 5. Saving data
- Chapter 5. Catalyst optimizer
- Chapter 6. Ingesting data with Spark Streaming
- Chapter 6. Creating a discretized stream
- Chapter 6. Saving the results to a file
- Chapter 6. Saving the computation state over time
- Chapter 6. Specifying the checkpointing directory
- Chapter 6. Using window operations for time-limited calculations
- Chapter 6. Using external data sources
- Chapter 6. Changing the streaming application to use Kafka
- Chapter 6. Performance of Spark Streaming jobs
- Chapter 6. Structured Streaming
- Chapter 7. Getting smart with MLlib
- Chapter 7. Classification of machine-learning algorithms
- Chapter 7. Linear algebra in Spark
- Chapter 7. Distributed matrices
- Chapter 7. Linear regression
- Chapter 7. Expanding the model to multiple linear regression
- Chapter 7. Analyzing and preparing the data
- Chapter 7. Fitting and using a linear regression model
- Chapter 7. Tweaking the algorithm
- Chapter 7. Plotting residual plots
- Chapter 7. Optimizing linear regression
- Chapter 8. ML: classification and clustering
- Chapter 8. Logistic regression
- Chapter 8. Preparing data to use logistic regression in Spark
- Chapter 8. Training the model
- Chapter 8. Performing k-fold cross-validation
- Chapter 8. Decision trees and random forests
- Chapter 8. Decision trees
- Chapter 8. Random forests
- Chapter 8. Using k-means clustering
- Chapter 8. K-means clustering
- Chapter 8. Summary
- Chapter 9. Connecting the dots with GraphX
- Chapter 9. Transforming graphs
- Chapter 9. Graph algorithms
- Chapter 9. Implementing the A* search algorithm
- Chapter 9. Implementing the A* algorithm
- Chapter 9. Summary
-
PART 3: SPARK OPS
- Chapter 10. Running Spark
- Chapter 10. Job and resource scheduling
- Chapter 10. Data-locality considerations
- Chapter 10. Configuring Spark
- Chapter 10. Spark web UI
- Chapter 10. Running Spark on the local machine
- Chapter 11. Running on a Spark standalone cluster
- Chapter 11. Starting the standalone cluster
- Chapter 11. Viewing Spark processes
- Chapter 11. Standalone cluster web UI
- Chapter 11. Specifying extra classpath entries and files
- Chapter 11. Spark History Server and event logging
- Chapter 11. Creating an EC2 standalone cluster
- Chapter 11. Using the EC2 cluster
- Chapter 12. Running on YARN and Mesos
- Chapter 12. Resource scheduling in YARN
- Chapter 12. Configuring Spark on YARN
- Chapter 12. Configuring resources for Spark jobs
- Chapter 12. Finding logs on YARN
- Chapter 12. Running Spark on Mesos
- Chapter 12. Installing and configuring Mesos
- Chapter 12. Mesos resource scheduling
- Chapter 12. Running Spark with Docker
-
PART 4: BRINGING IT TOGETHER
- Chapter 13. Case study: real-time dashboard
- Chapter 13. Running the application
- Chapter 13. Starting the application manually
- Chapter 13. Understanding the source code
- Chapter 13. The StreamingLogAnalyzer project
- Chapter 14. Deep learning on Spark with H2O
- Chapter 14. Using H2O with Spark
- Chapter 14. Performing regression with H2O’s deep learning
- Chapter 14. Building and evaluating a deep-learning model using the Sparkling Water API
- Chapter 14. Performing classification with H2O’s deep learning
Product information
- Title: Spark in Action video edition
- Author(s):
- Release date: November 2016
- Publisher(s): Manning Publications
- ISBN: None
You might also like
video
The Spark Video Collection: 2016
Watch this ongoing compilation of Spark talks from leading developers and practicioners at Strata + Hadoop …
video
Debugging Apache Spark
Apache Spark is an extremely powerful general purpose distributed system that also happens to be extremely …
video
Spark in Motion
See it. Do it. Learn it! Spark in Motion teaches you to use Spark for big …
video
Apache Spark Streaming with Python and PySpark
Spark Streaming is becoming incredibly popular, and with good reason. According to IBM, 90% of the …