Raju Kumar Mishra and Sundar Rajan RamanPySpark SQL Recipeshttps://doi.org/10.1007/978-1-4842-4335-0_1

1. Introduction to PySpark SQL

Raju Kumar Mishra¹ and Sundar Rajan Raman²

(1)

Bangalore, Karnataka, India

(2)

Chennai, Tamil Nadu, India

The amount of data that’s generated increases every day. Technology advances have facilitated the storage of huge amounts of data. This data deluge has forced users to adopt distributed systems. Distributed systems look for distributed programming, which requires extra care for fault tolerance and efficient algorithms. Distributed systems always look for two things—reliability of the system and availability of all the components.

Apache Hadoop ensures efficient computation ...

Get PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes by Raju Kumar Mishra, Sundar Rajan Raman

1. Introduction to PySpark SQL

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly