Chapter 6. Project Management with Make

I hope that by now you have come to appreciate that the command line is a very convenient environment for working with data. You may have noticed that, as a consequence of working with the command line, we:

  • Invoke many different commands

  • Work from various directories

  • Develop our own command-line tools

  • Obtain and generate many (intermediate) files

Since this is an exploratory process, our workflow tends to be rather chaotic, which makes it difficult to keep track of what we’ve done. It’s important that our steps can be reproduced, both by us and by others. When you continue with a project from some time ago, chances are that you have forgotten which commands you ran, from which directory, on which files, with which parameters, and in which order. Imagine the challenges of sharing your project with a collaborator.

You can recover some commands by digging through the output of the history command, but this is, of course, not a reliable approach. A somewhat better approach would be to save your commands to a shell script. At least this allows you and your collaborators to reproduce the project. A shell script is, however, also suboptimal, for several reasons:

  • It is difficult to read and to maintain.

  • Dependencies between steps are unclear.

  • Every step gets executed every time, which is inefficient and is also sometimes undesirable.

This is where make really shines. make1 is a command-line tool that allows you to:

  • Formalize ...

Get Data Science at the Command Line, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.