O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hands-On PySpark for Big Data Analysis

Video Description

Use PySpark to productionize analytics over Big Data and easily crush messy data at scale

About This Video

  • Work with large amounts of data with agility using distributed datasets and in-memory caching
  • Source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3
  • Deploy Big Data analytics to production using PySpark’s easy to use API

In Detail

Data is an incredible asset, especially when there are lots of it. Exploratory data analysis, business intelligence, and machine learning all depend on processing and analyzing Big Data at scale.

How do you go from working on prototypes on your local machine, to handling messy data in production and at scale?

This is a practical, hands-on course that shows you how to use Spark and it's Python API to create performant analytics with large-scale data. Don't reinvent the wheel, and wow your clients by building robust and responsible applications on Big Data.

All the code and supporting files for this course are available on Github at - https://github.com/PacktPublishing/Hands-On-Pyspark-for-Big-Data-Analysis