Skip to Content
Essential PySpark for Scalable Data Analytics
book

Essential PySpark for Scalable Data Analytics

by Sreeram Nudurupati
October 2021
Beginner to intermediate
322 pages
7h 27m
English
Packt Publishing
Content preview from Essential PySpark for Scalable Data Analytics

Section 3: Data Analysis

Once we have clean and integrated data in the data lake and have trained and built machine learning models at scale, the final step is to convey actionable insights to business owners in a meaningful manner to help them make business decisions. This section covers the business intelligence (BI) and SQL Analytics part of data analytics. It starts with various data visualization techniques using notebooks. Then, it introduces you to Spark SQL to perform business analytics at scale and shows techniques to connect BI and SQL Analysis tools to Apache Spark clusters. The section ends with an introduction to the Data Lakehouse paradigm to bridge the gap between data warehouses and data lakes to provide a single, unified, scalable ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analytics with Hadoop

Data Analytics with Hadoop

Benjamin Bengfort, Jenny Kim
Data Science on AWS

Data Science on AWS

Chris Fregly, Antje Barth

Publisher Resources

ISBN: 9781800568877Supplemental Content