Skip to Main Content
Data Science at the Command Line, 2nd Edition
book

Data Science at the Command Line, 2nd Edition

by Jeroen Janssens
August 2021
Beginner to intermediate content levelBeginner to intermediate
280 pages
6h 12m
English
O'Reilly Media, Inc.
Content preview from Data Science at the Command Line, 2nd Edition

Chapter 6. Project Management with Make

I hope that by now you have come to appreciate that the command line is a very convenient environment for working with data. You may have noticed that, as a consequence of working with the command line, we:

  • Invoke many different commands

  • Work from various directories

  • Develop our own command-line tools

  • Obtain and generate many (intermediate) files

Since this is an exploratory process, our workflow tends to be rather chaotic, which makes it difficult to keep track of what we’ve done. It’s important that our steps can be reproduced, both by us and by others. When you continue with a project from some time ago, chances are that you have forgotten which commands you ran, from which directory, on which files, with which parameters, and in which order. Imagine the challenges of sharing your project with a collaborator.

You can recover some commands by digging through the output of the history command, but this is, of course, not a reliable approach. A somewhat better approach would be to save your commands to a shell script. At least this allows you and your collaborators to reproduce the project. A shell script is, however, also suboptimal, for several reasons:

  • It is difficult to read and to maintain.

  • Dependencies between steps are unclear.

  • Every step gets executed every time, which is inefficient and is also sometimes undesirable.

This is where make really shines. make1 is a command-line tool that allows you to:

  • Formalize ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook

Python Data Science Handbook

Jake VanderPlas

Publisher Resources

ISBN: 9781492087908Errata PageSupplemental Content