Skip to Content
Software Engineering for Data Scientists
book

Software Engineering for Data Scientists

by Catherine Nelson
April 2024
Intermediate to advanced
260 pages
6h 22m
English
O'Reilly Media, Inc.
Content preview from Software Engineering for Data Scientists

Chapter 8. Design and Refactoring

In this chapter, I want to move away from thinking about the finer details of each line of code you write and toward the bigger picture: how to design your projects, how to arrange your code, and how to refactor your code when that design changes. I’ll include some ideas for how to organize and standardize the high-level structure of your projects and I’ll suggest how to break your code into modular, reusable functions.

Good design, whether at the level of a whole project or at the level of individual functions, has a number of benefits for your code. If your project design is somewhat standardized, it removes some of the mental load of switching from one project to another. It’s easier for someone to work on your project if they have seen something similar before. If your code is well designed, it is easier to reuse pieces of it in other projects, and it is easier to add new features.

In my experience as a data scientist, I’ve seen many projects in which all the code is in one giant Jupyter notebook. I’ve created projects like this myself. A Jupyter notebook is a fantastic way to get started on a project, draft your ideas, and try things out. But notebooks can be limiting when your project scales up or becomes more complex. You can see a framework for turning your notebooks into Python scripts in “From Notebooks to Scalable Scripts”.

It’s sometimes difficult in data science to know exactly when to design the structure of your project. You may ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Data Science: The Hard Parts

Data Science: The Hard Parts

Daniel Vaughan
Software Engineering at Google

Software Engineering at Google

Titus Winters, Tom Manshreck, Hyrum Wright

Publisher Resources

ISBN: 9781098136192Errata Page