Book description
Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.
Table of contents
- Foreword
- Preface
- 1. Introduction to Data Analysis with Spark
- 2. Downloading Spark and Getting Started
- 3. Programming with RDDs
- 4. Working with Key/Value Pairs
- 5. Loading and Saving Your Data
- 6. Advanced Spark Programming
- 7. Running on a Cluster
- 8. Tuning and Debugging Spark
- 9. Spark SQL
- 10. Spark Streaming
- 11. Machine Learning with MLlib
- Index
Product information
- Title: Learning Spark
- Author(s):
- Release date: February 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781449358624
You might also like
book
Python for Data Analysis, 3rd Edition
Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python …
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
book
Generative Deep Learning, 2nd Edition
Generative AI is the hottest topic in tech. This practical book teaches machine learning engineers and …
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. …