Spark in Action video edition

Video description

"Dig in and get your hands dirty with one of the hottest data processing engines today. A great guide."
Jonathan Sharley, Pandora Media

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code.

Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades.
Inside:

  • Updated for Spark 2.0
  • Real-life case studies
  • Spark DevOps with Docker
  • Examples in Scala, and online in Java and Python
Made for experienced programmers with some background in big data or machine learning.

Petar Zečević and Marko Bonaći are seasoned developers heavily involved in the Spark community.

Must-have! Speed up your learning of Spark as a distributed computing framework.
Robert Ormandi, Yahoo!

An easy-to-follow, step-by-step guide.
Gaurav Bhardwaj, 3Pillar Global

An ambitiously comprehensive overview of Spark and its diverse ecosystem.
Jonathan Miller, Optensity

NARRATED BY KYLE JACKSON AND MARK THOMAS

Table of contents

  1. PART 1: FIRST STEPS
    1. Chapter 1. Introduction to Apache Spark
    2. Chapter 1. What Spark brings to the table
    3. Chapter 1. Spark components
    4. Chapter 1. Spark program flow
    5. Chapter 1. Setting up the spark-in-action VM
    6. Chapter 2. Spark fundamentals
    7. Chapter 2. Using the VM’s Hadoop installation
    8. Chapter 2. Using Spark shell and writing your first Spark program
    9. Chapter 2. Basic RDD actions and transformations
    10. Chapter 2. Using the distinct and flatMap transformations
    11. Chapter 2. Obtaining RDD’s elements with the sample, take, and takeSample operations
    12. Chapter 2. Double RDD functions
    13. Chapter 3. Writing Spark applications
    14. Chapter 3. Developing the application
    15. Chapter 3. Running the application from Eclipse
    16. Chapter 3. Broadcast variables
    17. Chapter 3. Submitting the application
    18. Chapter 3. Using spark-submit
    19. Chapter 4. The Spark API in depth
    20. Chapter 4. Basic pair RDD functions
    21. Chapter 4. Using the flatMapValues transformation to add values to keys
    22. Chapter 4. Understanding data partitioning and reducing data shuffling
    23. Chapter 4. Understanding and avoiding unnecessary shuffling
    24. Chapter 4. Repartitioning RDDs
    25. Chapter 4. Joining, sorting, and grouping data
    26. Chapter 4. Joining data
    27. Chapter 4. Sorting data
    28. Chapter 4. Grouping data
    29. Chapter 4. Understanding RDD dependencies
    30. Chapter 4. Using accumulators and broadcast variables to communicate with Spark executors
    31. Chapter 4. Sending data to executors using broadcast variables
  2. PART 2: MEET THE SPARK FAMILY
    1. Chapter 5. Sparkling queries with Spark SQL
    2. Chapter 5. Creating DataFrames from RDDs
    3. Chapter 5. Creating a DataFrame from an RDD of tuples
    4. Chapter 5. DataFrame API basics
    5. Chapter 5. Using SQL functions to perform calculations on data
    6. Chapter 5. Working with missing values
    7. Chapter 5. Grouping and joining data
    8. Chapter 5. Beyond DataFrames: introducing DataSets
    9. Chapter 5. Table catalog and Hive metastore
    10. Chapter 5. Executing SQL queries
    11. Chapter 5. Saving and loading DataFrame data
    12. Chapter 5. Saving data
    13. Chapter 5. Catalyst optimizer
    14. Chapter 6. Ingesting data with Spark Streaming
    15. Chapter 6. Creating a discretized stream
    16. Chapter 6. Saving the results to a file
    17. Chapter 6. Saving the computation state over time
    18. Chapter 6. Specifying the checkpointing directory
    19. Chapter 6. Using window operations for time-limited calculations
    20. Chapter 6. Using external data sources
    21. Chapter 6. Changing the streaming application to use Kafka
    22. Chapter 6. Performance of Spark Streaming jobs
    23. Chapter 6. Structured Streaming
    24. Chapter 7. Getting smart with MLlib
    25. Chapter 7. Classification of machine-learning algorithms
    26. Chapter 7. Linear algebra in Spark
    27. Chapter 7. Distributed matrices
    28. Chapter 7. Linear regression
    29. Chapter 7. Expanding the model to multiple linear regression
    30. Chapter 7. Analyzing and preparing the data
    31. Chapter 7. Fitting and using a linear regression model
    32. Chapter 7. Tweaking the algorithm
    33. Chapter 7. Plotting residual plots
    34. Chapter 7. Optimizing linear regression
    35. Chapter 8. ML: classification and clustering
    36. Chapter 8. Logistic regression
    37. Chapter 8. Preparing data to use logistic regression in Spark
    38. Chapter 8. Training the model
    39. Chapter 8. Performing k-fold cross-validation
    40. Chapter 8. Decision trees and random forests
    41. Chapter 8. Decision trees
    42. Chapter 8. Random forests
    43. Chapter 8. Using k-means clustering
    44. Chapter 8. K-means clustering
    45. Chapter 8. Summary
    46. Chapter 9. Connecting the dots with GraphX
    47. Chapter 9. Transforming graphs
    48. Chapter 9. Graph algorithms
    49. Chapter 9. Implementing the A* search algorithm
    50. Chapter 9. Implementing the A* algorithm
    51. Chapter 9. Summary
  3. PART 3: SPARK OPS
    1. Chapter 10. Running Spark
    2. Chapter 10. Job and resource scheduling
    3. Chapter 10. Data-locality considerations
    4. Chapter 10. Configuring Spark
    5. Chapter 10. Spark web UI
    6. Chapter 10. Running Spark on the local machine
    7. Chapter 11. Running on a Spark standalone cluster
    8. Chapter 11. Starting the standalone cluster
    9. Chapter 11. Viewing Spark processes
    10. Chapter 11. Standalone cluster web UI
    11. Chapter 11. Specifying extra classpath entries and files
    12. Chapter 11. Spark History Server and event logging
    13. Chapter 11. Creating an EC2 standalone cluster
    14. Chapter 11. Using the EC2 cluster
    15. Chapter 12. Running on YARN and Mesos
    16. Chapter 12. Resource scheduling in YARN
    17. Chapter 12. Configuring Spark on YARN
    18. Chapter 12. Configuring resources for Spark jobs
    19. Chapter 12. Finding logs on YARN
    20. Chapter 12. Running Spark on Mesos
    21. Chapter 12. Installing and configuring Mesos
    22. Chapter 12. Mesos resource scheduling
    23. Chapter 12. Running Spark with Docker
  4. PART 4: BRINGING IT TOGETHER
    1. Chapter 13. Case study: real-time dashboard
    2. Chapter 13. Running the application
    3. Chapter 13. Starting the application manually
    4. Chapter 13. Understanding the source code
    5. Chapter 13. The StreamingLogAnalyzer project
    6. Chapter 14. Deep learning on Spark with H2O
    7. Chapter 14. Using H2O with Spark
    8. Chapter 14. Performing regression with H2O’s deep learning
    9. Chapter 14. Building and evaluating a deep-learning model using the Sparkling Water API
    10. Chapter 14. Performing classification with H2O’s deep learning

Product information

  • Title: Spark in Action video edition
  • Author(s): Petar Zečević, Marko Bonaći
  • Release date: November 2016
  • Publisher(s): Manning Publications
  • ISBN: None