Chapter 2. Getting Started
In this chapter, I’m going to make sure that you have all the prerequisites for doing data science at the command line. The prerequisites are threefold: (1) having the same datasets that I use in this book, (2) having a proper environment with all the command-line tools that I use throughout this book, and (3) understanding the essential concepts that come into play when using the command line.
First, I describe how to download the datasets. Second, I explain how to install the Docker image, which is a virtual environment based on Ubuntu Linux that contains all the necessary command-line tools. Finally, I go over the essential Unix concepts through examples.
By the end of this chapter, you’ll have everything you need to continue with the first step of doing data science, namely obtaining data.
Getting the Data
The datasets I use in this book can be obtained as follows:
-
Download the ZIP file from the book’s website.
-
Create a new directory. You can give this directory any name you like, but I recommend you stick to lowercase letters, numbers, and maybe a hyphen or an underscore so that the name is easier to work with at the command line—for example, dsatcl2. Remember where this directory is.
-
Move the ZIP file to that new directory and unpack it.
-
This directory now contains one subdirectory per chapter.
In the next section I explain how to install the environment containing all the command-line tools to work with this data.