Skip to Content
Essential PySpark for Scalable Data Analytics
book

Essential PySpark for Scalable Data Analytics

by Sreeram Nudurupati
October 2021
Beginner to intermediate
322 pages
7h 27m
English
Packt Publishing
Content preview from Essential PySpark for Scalable Data Analytics

Chapter 12: Spark SQL Primer

In the previous chapter, you learned about data visualizations as a powerful and key tool of data analytics. You also learned about various Python visualization libraries that can be used to visualize data in pandas DataFrames. An equally important and ubiquitous and essential skill in any data analytics professional's repertoire is Structured Query Language or SQL. SQL has existed as long as the field of data analytics has existed, and even with the advent of big data, data science, and machine learning (ML), SQL is still proving to be indispensable.

This chapter introduces you to the basics of SQL and looks at how SQL can be applied in a distributed computing setting via Spark SQL. You will learn about the various ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analytics with Hadoop

Data Analytics with Hadoop

Benjamin Bengfort, Jenny Kim
Data Science on AWS

Data Science on AWS

Chris Fregly, Antje Barth

Publisher Resources

ISBN: 9781800568877Supplemental Content