Skip to main content

Get full access to Advanced Analytics with PySpark and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Advanced Analytics with PySpark

Advanced Analytics with PySpark

by Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills

Released June 2022

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781098103651

Buy on Amazon Buy on ebooks.com

Start your free trial

Book description

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming.

Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing.

If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis.

Familiarize yourself with Spark's programming model and ecosystem
Learn general approaches in data science
Examine complete implementations that analyze large public datasets
Discover which machine learning tools make sense for particular problems
Explore code that can be adapted to many uses

Publisher resources

View/Submit Errata

Table of contents

Product information

Title: Advanced Analytics with PySpark
Author(s): Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills
Release date: June 2022
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781098103651

You might also like

book

Advanced Analytics with Spark, 2nd Edition

by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills

In the second edition of this practical book, four Cloudera data scientists present a set of …

book

Simplify Big Data Analytics with Amazon EMR

by Sakti Mishra

Design scalable big data solutions using Hadoop, Spark, and AWS cloud native services Key Features Build …

video

Mastering Big Data Analytics with PySpark

by Danny Meijer

PySpark helps you perform data analysis at-scale; it enables you to build more scalable analyses and …

book

Data Analysis with Python and PySpark

by Jonathan Rioux

Think big about your data! PySpark brings the powerful Spark big data processing engine to the …

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now