Software Engineering for Data Scientists

by Catherine Nelson

Released April 2024

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781098136208

Book description

Data science happens in code. The ability to write reproducible, robust, scaleable code is key to a data science project's success—and is absolutely essential for those working with production code. This practical book bridges the gap between data science and software engineering, and clearly explains how to apply the best practices from software engineering to data science.

Examples are provided in Python, drawn from popular packages such as NumPy and pandas. If you want to write better data science code, this guide covers the essential topics that are often missing from introductory data science or coding classes, including how to:

Understand data structures and object-oriented programming
Clearly and skillfully document your code
Package and share your code
Integrate data science code with a larger code base
Learn how to write APIs
Create secure code
Apply best practices to common tasks such as testing, error handling, and logging
Work more effectively with software engineers
Write more efficient, maintainable, and robust code in Python
Put your data science projects into production
And more