Errata

Errata for Data Science at the Command Line

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
PDF, ePub	Page X 2nd paragraph of "What to Expect from This Book" (within Preface)	Preface / What to Expect from This Book, 2nd chapter: "while others will be replace by better ones" should read "while others will be replaced by better ones"	Jochen Hayek	Aug 19, 2014
	Chapter 5 Section "Common Scrub Operations for Plain Text", subsection "Based on pattern"	Chapter 5, section "Common Scrub Operations for Plain Text", subsection "Based on pattern", states at the very end: «Note that you have to specify the -E option in order to enable regular expressions. Otherwise, grep interprets the pattern as a literal string.» That, I fear, is completely off the mark. grep, without additional arguments, evaluate the pattern provided as a Basic Regular Expression, not as a literal. The -E ( --extended-regexp ) switch, or invoking it as egrep, merely uses Extended Regular Expressions instead of Basic Regular Expressions. In order to interpret the pattern as a literal string, one needs to use the -F (--fixed-strings) switch, or to invoke grep as fgrep. This is a basic and important notion about the way grep works which ought to be rectified as soon as possible.	Fulvio Scapin	Nov 09, 2014
	1 Executing a Command-Line Tool 6th paragraph	The text reads: > A long command can be broken up with either a backslash (\) or a pipe symbol (\|) . I don't think a `pipe` can be used to "break up" a command the way the backslash does, this may confuse some readers who are not comfortable with Bash.	Andres Lowrie	Jan 30, 2018
PDF	Page 11 line 11	"There are a few command-line tools that require the complete data before they write any data to standard output, like sort and awk" – that is simply untrue for awk – would you please remove awk from that list – awk is a classic Unix filter utility, it certainly does not wait for the end of the input in order to process all of its input.	Jochen Hayek	Apr 08, 2015
PDF	Page 16 Side note near top of page	URL is given as "http://datasciencatthecommandline.com" - the "e" in "science" is missing.	Anonymous	Aug 11, 2014
PDF	Page 16 Exampe 202	The 'elaborate' Vagrantfile described here is insufficient to test out the examples provided in the book. In Chapter 3 Obtaining Data the author demonstrates how to use cURLs to pull data from the Internet. But I could not access the Internet from my virutal machine using this Vagrantfile. The author does not tell the user how to configure Vagrant to access the Internet. I had to search for an answer. I added the following to my Vagrantfile: vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] This is a major omission. If you feel the need to tell the user how to use the -pwd command, you should certainly ensure that the user has Vagrant configured to follow the examples.	Anonymous	Dec 14, 2014
ePub	Page 23 Section 2.3 InfoBox	Poorly worded sentence. Says: "We will only explain the concepts and tools that are relevant for to doing data science." Should say: "We will only explain the concepts and tools that are relevant to doing data science"	Anonymous	Oct 15, 2019
PDF	Page 36 Infobox	In the infobox on page 36 in subchapter "Converting Microsoft Excel Spreadsheets" an alternate solution for in2csv is described as opening the spreadsheet in LibreOffice Calc. Maybe it is worth mentioning, that LibreOffice has some sort of command line mode when called with the "headlesss" parameter from the terminal. The following command will export all Excel spreadsheets in the current folder to csv files without opening a GUI. So the listed disadvantages of the alternate solution in the infobox are not all true, except for the availability of LibreOffice on remote servers. $ libreoffice --headless --convert-to csv .xlsx The lightweight Gnumeric spreadsheet program even has a more advanced command line tool named ssconvert, which is able to export multiple tables from one spreadsheet file. A feature that LibreOffice and in2csv* are currently missing. Use ascending integer for csv file name: $ ssconvert --export-file-per-sheet tables.xlsx table-%n.csv Use table name for csv file name: $ ssconvert --export-file-per-sheet tables.xlsx %s.csv	Benjamin Meier	Aug 20, 2014
PDF	Page 36 4th paragraph	Of the text "contains the unwanted text and even an error message" please remove "and even an error message", as there is no error message shown at all.	Jochen Hayek	Apr 08, 2015
PDF	Page 46 4th paragraph of Step 3: Define Shebang	Shebangs always look like #!..., the "#" is missing for each example.	Jochen Hayek	Apr 08, 2015
PDF	Page 56 First code example	The -e option is needed for the echo command to work as indicated. You have: echo 'foo\nbar\nfoo' \| sort \| uniq -c \| sort -nr The command should be: echo -e 'foo\nbar\nfoo' \| sort \| uniq -c \| sort -nr to get the result shown: 2 foo 1 bar	Anonymous	Dec 14, 2014
PDF	Page 56 First and Second line of code	The code printed states: echo 'foo\nbar\nfoo' \| sort \| uniq -c \| sort -nr However, I believe for echo to interpret backslash escapes it must have -e flag like such: echo -e 'foo\nbar\nfoo' \| sort \| uniq -c \| sort -nr	Joe Lotz	Dec 26, 2014
PDF, ePub	Page 60 United States	Missing chapters 5 on. When will these be available? I thought I was receiving the complete book.	Dennis Barnes	Jul 21, 2014
ePub	Page 61 last code snippet of Step 4	The ePub snippets is $ cat data/ \| ./top-words-4.sh According to the paragraph above, it should be $ cat data/finn.txt \| ./top-words-4.sh	Sébastien Portebois	Oct 12, 2014
PDF	Page 73 First two examples	The -e option is required for the echo command to work. Instead of echo 'a,b,c,d,e,f,g,h,i\n1,2,3,4,5,6,7,8,9' \| csvcut -c $(seq 1 2 9 \| paste -sd,) the command should read echo -e 'a,b,c,d,e,f,g,h,i\n1,2,3,4,5,6,7,8,9' \| csvcut -c $(seq 1 2 9 \| paste -sd,) The same goes for the second example on the page.	Butcher Pete	Oct 08, 2015
Printed	Page 105 second command of the page	The original command specified returns an error, <data/immigration.csv csvcut -c Period,Denmark,Belgium,Netherlands,Norway,Sweden\|Rio -re 'melt(df, id="Period", variable.name="Country",value.name="Count")'\|tee data/immigration-long.csv\|head\|csvlook returning Loading required package: tidyr Error: could not find function "melt" Execution halted As melt is indeed included in the reshape2 library, once installed, the fix is to load the library in the command, as: <data/immigration.csv csvcut -c Period,Denmark,Belgium,Netherlands,Norway,Sweden\|Rio -re 'library(reshape2);melt(df, id="Period", variable.name="Country",value.name="Count")'\|tee data/immigration-long.csv\|head\|csvlook	Nelson Gaasch	Feb 06, 2016