Chapter 6. Project Management with Make

I hope that by now you have come to appreciate that the command line is a very convenient environment for working with data. You may have noticed that, as a consequence of working with the command line, we:

  • Invoke many different commands

  • Work from various directories

  • Develop our own command-line tools

  • Obtain and generate many (intermediate) files

Since this is an exploratory process, our workflow tends to be rather chaotic, which makes it difficult to keep track of what we’ve done. It’s important that our steps can be reproduced, both by us and by others. When you continue with a project from some time ago, chances are that you have forgotten which commands you ran, from which directory, on which files, with which parameters, and in which order. Imagine the challenges of sharing your project with a collaborator.

You can recover some commands by digging through the output of the history command, but this is, of course, not a reliable approach. A somewhat better approach would be to save your commands to a shell script. At least this allows you and your collaborators to reproduce the project. A shell script is, however, also suboptimal, for several reasons:

  • It is difficult to read and to maintain.

  • Dependencies between steps are unclear.

  • Every step gets executed every time, which is inefficient and is also sometimes undesirable.

This is where make really shines. make1 is a command-line tool that allows you to:

  • Formalize ...

Get Data Science at the Command Line, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.