9 Working with Bags and Arrays

This chapter covers

  • Reading, transforming, and analyzing unstructured data using Bags
  • Creating Arrays and DataFrames from Bags
  • Extracting and filtering data from Bags
  • Combining and grouping elements of Bags using fold and reduce functions
  • Using NLTK (Natural Language Toolkit) with Bags for text mining on large text datasets

The majority of this book focuses on using DataFrames for analyzing structured data, but our exploration of Dask would not be complete without mentioning the two other high-level Dask APIs: Bags and Arrays. When your data doesn’t fit neatly in a tabular model, Bags and Arrays offer additional flexibility. DataFrames are limited to only two dimensions (rows and columns), but Arrays can ...

Get Data Science with Python and Dask now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.