Skip to Content
Data Science with Python and Dask
book

Data Science with Python and Dask

by Jesse Daniel
July 2019
Intermediate to advanced
296 pages
9h 1m
English
Manning Publications
Content preview from Data Science with Python and Dask

9 Working with Bags and Arrays

This chapter covers

  • Reading, transforming, and analyzing unstructured data using Bags
  • Creating Arrays and DataFrames from Bags
  • Extracting and filtering data from Bags
  • Combining and grouping elements of Bags using fold and reduce functions
  • Using NLTK (Natural Language Toolkit) with Bags for text mining on large text datasets

The majority of this book focuses on using DataFrames for analyzing structured data, but our exploration of Dask would not be complete without mentioning the two other high-level Dask APIs: Bags and Arrays. When your data doesn’t fit neatly in a tabular model, Bags and Arrays offer additional flexibility. DataFrames are limited to only two dimensions (rows and columns), but Arrays can ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Data Science with Python

Practical Data Science with Python

Nathan George
Python: End-to-end Data Analysis

Python: End-to-end Data Analysis

Phuong Vothihong, Martin Czygan, Ivan Idris, Magnus Vilhelm Persson, Luiz Felipe Martins

Publisher Resources

ISBN: 9781617295607OtherSupplemental ContentPublisher SupportPublisher WebsiteSupplemental ContentPurchase Link