O'Reilly logo

Data Science with Python and Dask by Jesse Daniel

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

9 Working with Bags and Arrays

This chapter covers

  • Reading, transforming, and analyzing unstructured data using Bags
  • Creating Arrays and DataFrames from Bags
  • Extracting and filtering data from Bags
  • Combining and grouping elements of Bags using fold and reduce functions
  • Using NLTK (Natural Language Toolkit) with Bags for text mining on large text datasets

The majority of this book focuses on using DataFrames for analyzing structured data, but our exploration of Dask would not be complete without mentioning the two other high-level Dask APIs: Bags and Arrays. When your data doesn’t fit neatly in a tabular model, Bags and Arrays offer additional flexibility. DataFrames are limited to only two dimensions (rows and columns), but Arrays can ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required