Appendix A. Key System Concepts for Dask Users

We’ve covered a few distributed system concepts briefly as needed in this book, but as you get ready to head out on your own, it’s a good idea to review some of the core concepts that Dask is built on. In this appendix, you will learn more about the key principles used in Dask and how they impact the code you write on top of Dask.

Testing

Testing is an often overlooked part of data science and data engineering. Some of our tools, like SQL and Jupyter notebooks, do not encourage testing or make it easy to test—but this does not absolve us of the responsibility to test our code. Data privacy concerns can add another layer of challenge, where we don’t want to store user data for testing, requiring us to put in the effort to create “fake” data for testing or break our code down into testable components where we don’t need user data.

Manual Testing

We often perform some kind of manual testing while writing software or data tools. This can include simply running the tool and eyeballing the results to see if they look reasonable. Manual testing is time-consuming and not automatically repeatable, so while it is great during development, it is insufficient for long-lived projects.

Unit Testing

Unit testing refers to testing individual units of code rather than the whole system together. This requires having your code be composed in different units, like modules or functions. While this is less common with notebooks, we believe that structuring ...

Get Scaling Python with Dask now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.