© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
T. SarkarProductive and Efficient Data Science with Pythonhttps://doi.org/10.1007/978-1-4842-8121-5_10

10. Parallelized Data Science

Tirthajyoti Sarkar1  
(1)
Fremont, CA, USA
 

In the last chapter, I talked about how data science tasks may encounter a wide variety of dataset sizes, ranging from kilobytes to petabytes. There can be a range of scale either in the number of samples or the extent of feature dimensionality. To handle complex data analytics and machine learning, data scientists employ a dizzying array of models, and that ecosystem scales up quickly, too.

Handling data and models at scale is a special skill to be acquired. When a data scientist starts ...

Get Productive and Efficient Data Science with Python: With Modularizing, Memory profiles, and Parallel/GPU Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.