15. Aggregating your data

This chapter covers

  • Refreshing your knowledge of aggregations
  • Performing basic aggregations
  • Using live data to perform aggregations
  • Building custom aggregations

Aggregating is a way to group data so you can view it at a macro level rather than an atomic, or micro, level. Aggregations are an essential step to better analytics, and down the road from machine learning and artificial intelligence.

In this chapter, you will start slow, with a small reminder of what aggregations are. Then you’ll perform basic aggregations with Spark. You will be using both Spark SQL and the dataframe API.

Once you go through the basics, you will analyze open data from New York City public schools. You will study attendance, absenteeism, ...

Get Spark in Action, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.