4

Spark DataFrames and their Operations

In this chapter, we will learn about a few different APIs in Spark and talk about their features. We will also get started with Spark’s DataFrame operations and look at different data viewing and manipulation techniques such as filtering, adding, renaming, and dropping columns available in Spark.

We will cover these concepts under the following topics:

  • The Spark DataFrame API
  • Creating DataFrames
  • Viewing DataFrames
  • Manipulating DataFrames
  • Aggregating DataFrames

By the end of this chapter, you will know how to work with PySpark DataFrames. You’ll also discover various data manipulation techniques and see how you can view data after manipulating it.

Getting Started in PySpark

In the previous chapters, we ...

Get Databricks Certified Associate Developer for Apache Spark Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.