Hands-On PySpark for Big Data Analysis

Video description

Use PySpark to productionize analytics over Big Data and easily crush messy data at scale

About This Video

  • Work with large amounts of data with agility using distributed datasets and in-memory caching
  • Source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3
  • Deploy Big Data analytics to production using PySpark’s easy to use API

In Detail

Data is an incredible asset, especially when there are lots of it. Exploratory data analysis, business intelligence, and machine learning all depend on processing and analyzing Big Data at scale.

How do you go from working on prototypes on your local machine, to handling messy data in production and at scale?

This is a practical, hands-on course that shows you how to use Spark and it's Python API to create performant analytics with large-scale data. Don't reinvent the wheel, and wow your clients by building robust and responsible applications on Big Data.

All the code and supporting files for this course are available on Github at - https://github.com/PacktPublishing/Hands-On-Pyspark-for-Big-Data-Analysis

Product information

  • Title: Hands-On PySpark for Big Data Analysis
  • Author(s): Rudy Lai
  • Release date: December 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789530056