Skip to Content
Data Analysis with Python and PySpark
book

Data Analysis with Python and PySpark

by Jonathan Rioux
March 2022
Beginner to intermediate
456 pages
13h
English
Manning Publications
Content preview from Data Analysis with Python and PySpark

6 Multidimensional data frames: Using PySpark with JSON data

This chapter covers

  • Drawing parallels between JSON documents and Python data structures
  • Ingesting JSON data within a data frame
  • Representing hierarchical data in a data frame through complex column types
  • Reducing duplication and reliance on auxiliary tables with a document/hierarchical data model
  • Creating and unpacking data from complex data types

Thus far, we have used PySpark’s data frame to work with textual (chapters 2 and 3) and tabular (chapters 4 and 5) data. Both data formats were pretty different, but they fit seamlessly into the data frame structure. I believe we’re ready to push the abstraction a little further by representing hierarchical information within a data frame. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analysis with Pandas and Python

Data Analysis with Pandas and Python

Boris Paskhaver

Publisher Resources

ISBN: 9781617297205Supplemental ContentPublisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link