Chapter 8. Refactoring and Technical Debt Management

Programs must be written for people to read, and only incidentally for machines to execute.

Harold Abelson, Structure and Interpretation of Computer Programs (MIT Press)

Without refactoring, the internal design—the architecture—of software tends to decay. As people change code to achieve short-term goals, often without a full comprehension of the architecture, the code loses its structure [...]. Loss of the structure of code has a cumulative effect. The harder it is to see the design in the code, the harder it is for me to preserve it, and the more rapidly it decays. Regular refactoring helps keep the code in shape.

Martin Fowler, Refactoring: Improving the Design of Existing Code (Addison-Wesley Professional)

As ML practitioners, we know that code can get messy, and usually much more quickly than we expect. Typically, code to train ML models comprises semi-boilerplate code glued together in a long notebook or script, generously peppered with side effects—e.g., print statements, pretty-printed dataframes, data visualizations—and usually without any automated tests.

While this may be fine for notebooks targeted at teaching people about the ML process, in real projects it’s a recipe for unmaintainable mess, cognitive overload, and friction to the point of halting progress. Poor coding habits and the lack of design makes code hard to understand and, consequently, very hard to change. This makes feature development and model improvements ...

Get Effective Machine Learning Teams now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.