Video description
Use PySpark to productionize analytics over Big Data and easily crush messy data at scale
About This Video
- Work with large amounts of data with agility using distributed datasets and in-memory caching
- Source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3
- Deploy Big Data analytics to production using PySpark’s easy to use API
In Detail
Data is an incredible asset, especially when there are lots of it. Exploratory data analysis, business intelligence, and machine learning all depend on processing and analyzing Big Data at scale.
How do you go from working on prototypes on your local machine, to handling messy data in production and at scale?
This is a practical, hands-on course that shows you how to use Spark and it's Python API to create performant analytics with large-scale data. Don't reinvent the wheel, and wow your clients by building robust and responsible applications on Big Data.
All the code and supporting files for this course are available on Github at - https://github.com/PacktPublishing/Hands-On-Pyspark-for-Big-Data-Analysis
Table of contents
-
Chapter 1 : Install PySpark and Setup Your Development Environment
- The Course Overview 00:03:03
- Core Concepts in Spark and PySpark 00:09:06
- Setting Up Spark on Windows and PySpark 00:07:51
- SparkContext, SparkConf and Spark Shell 00:09:59
-
Chapter 2 : Getting Your Big Data into the Spark Environment Using RDDs
- Loading Data onto Spark RDDs 00:05:02
- Parallelization with Spark RDDs 00:06:34
- RDD Operation Basics 00:08:17
- Chapter 3 : Big Data Cleaning and Wrangling with Spark Notebooks
- Chapter 4 : Aggregating and Summarizing Data into Useful Reports
- Chapter 5 : Powerful Exploratory Data Analysis with MLlib
- Chapter 6 : Putting Structure on Your Big Data with SparkSQL
Product information
- Title: Hands-On PySpark for Big Data Analysis
- Author(s):
- Release date: December 2018
- Publisher(s): Packt Publishing
- ISBN: 9781789530056
You might also like
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
video
Python Fundamentals
51+ hours of video instruction. Overview The professional programmer’s Deitel® video guide to Python development with …
book
Learning SQL, 3rd Edition
As data floods into your company, you need to put it to work right away—and SQL …
book
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …