The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Version |
Location |
Description |
Submitted by |
Date submitted |
PDF, ePub |
Page X
2nd paragraph of "What to Expect from This Book" (within Preface) |
Preface / What to Expect from This Book, 2nd chapter:
"while others will be replace by better ones" should read "while others will be replaced by better ones"
|
Jochen Hayek |
Aug 19, 2014 |
|
Chapter 5
Section "Common Scrub Operations for Plain Text", subsection "Based on pattern" |
Chapter 5, section "Common Scrub Operations for Plain Text", subsection "Based on pattern", states at the very end:
«Note that you have to specify the -E option in order to enable regular expressions. Otherwise, grep interprets the pattern as a literal string.»
That, I fear, is completely off the mark.
grep, without additional arguments, evaluate the pattern provided as a Basic Regular Expression, not as a literal.
The -E ( --extended-regexp ) switch, or invoking it as egrep, merely uses Extended Regular Expressions instead of Basic Regular Expressions.
In order to interpret the pattern as a literal string, one needs to use the -F (--fixed-strings) switch, or to invoke grep as fgrep.
This is a basic and important notion about the way grep works which ought to be rectified as soon as possible.
|
Fulvio Scapin |
Nov 09, 2014 |
|
1
Executing a Command-Line Tool 6th paragraph |
The text reads:
> A long command can be broken up with either a backslash (\) or a pipe symbol (|) .
I don't think a `pipe` can be used to "break up" a command the way the backslash does, this may confuse some readers who are not comfortable with Bash.
|
Andres Lowrie |
Jan 30, 2018 |
PDF |
Page 11
line 11 |
"There are a few command-line tools that require the complete data before they write any data to standard output, like sort and awk" – that is simply untrue for awk – would you please remove awk from that list – awk is a classic Unix filter utility, it certainly does not wait for the end of the input in order to process all of its input.
|
Jochen Hayek |
Apr 08, 2015 |
PDF |
Page 16
Side note near top of page |
URL is given as "http://datasciencatthecommandline.com" - the "e" in "science" is missing.
|
Anonymous |
Aug 11, 2014 |
PDF |
Page 16
Exampe 202 |
The 'elaborate' Vagrantfile described here is insufficient to test out the examples provided in the book.
In Chapter 3 Obtaining Data the author demonstrates how to use cURLs to pull data from the Internet. But I could not access the Internet from my virutal machine using this Vagrantfile.
The author does not tell the user how to configure Vagrant to access the Internet. I had to search for an answer.
I added the following to my Vagrantfile:
vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
This is a major omission. If you feel the need to tell the user how to use the -pwd command, you should certainly ensure that the user has Vagrant configured to follow the examples.
|
Anonymous |
Dec 14, 2014 |
ePub |
Page 23
Section 2.3 InfoBox |
Poorly worded sentence. Says: "We will only explain the concepts and tools that are relevant for to doing data science."
Should say: "We will only explain the concepts and tools that are relevant to doing data science"
|
Anonymous |
Oct 15, 2019 |
PDF |
Page 36
Infobox |
In the infobox on page 36 in subchapter "Converting Microsoft Excel Spreadsheets" an alternate solution for *in2csv* is described as opening the spreadsheet in LibreOffice Calc. Maybe it is worth mentioning, that LibreOffice has some sort of command line mode when called with the "headlesss" parameter from the terminal.
The following command will export all Excel spreadsheets in the current folder to csv files without opening a GUI. So the listed disadvantages of the alternate solution in the infobox are not all true, except for the availability of LibreOffice on remote servers.
$ libreoffice --headless --convert-to csv *.xlsx
The lightweight Gnumeric spreadsheet program even has a more advanced command line tool named *ssconvert*, which is able to export multiple tables from one spreadsheet file. A feature that LibreOffice and *in2csv* are currently missing.
Use ascending integer for csv file name:
$ ssconvert --export-file-per-sheet tables.xlsx table-%n.csv
Use table name for csv file name:
$ ssconvert --export-file-per-sheet tables.xlsx %s.csv
|
Benjamin Meier |
Aug 20, 2014 |
PDF |
Page 36
4th paragraph |
Of the text "contains the unwanted text and even an error message" please remove "and even an error message", as there is no error message shown at all.
|
Jochen Hayek |
Apr 08, 2015 |
PDF |
Page 46
4th paragraph of Step 3: Define Shebang |
Shebangs always look like #!..., the "#" is missing for each example.
|
Jochen Hayek |
Apr 08, 2015 |
PDF |
Page 56
First code example |
The -e option is needed for the echo command to work as indicated.
You have:
echo 'foo\nbar\nfoo' | sort | uniq -c | sort -nr
The command should be:
echo -e 'foo\nbar\nfoo' | sort | uniq -c | sort -nr
to get the result shown:
2 foo
1 bar
|
Anonymous |
Dec 14, 2014 |
PDF |
Page 56
First and Second line of code |
The code printed states:
echo 'foo\nbar\nfoo' | sort | uniq -c | sort -nr
However, I believe for echo to interpret backslash escapes it must have -e flag like such:
echo -e 'foo\nbar\nfoo' | sort | uniq -c | sort -nr
|
Joe Lotz |
Dec 26, 2014 |
PDF, ePub |
Page 60
United States |
Missing chapters 5 on. When will these be available? I thought I was receiving the complete book.
|
Dennis Barnes |
Jul 21, 2014 |
ePub |
Page 61
last code snippet of Step 4 |
The ePub snippets is
$ cat data/ | ./top-words-4.sh
According to the paragraph above, it should be
$ cat data/finn.txt | ./top-words-4.sh
|
Sébastien Portebois |
Oct 12, 2014 |
PDF |
Page 73
First two examples |
The -e option is required for the echo command to work.
Instead of
echo 'a,b,c,d,e,f,g,h,i\n1,2,3,4,5,6,7,8,9' | csvcut -c $(seq 1 2 9 | paste -sd,)
the command should read
echo -e 'a,b,c,d,e,f,g,h,i\n1,2,3,4,5,6,7,8,9' | csvcut -c $(seq 1 2 9 | paste -sd,)
The same goes for the second example on the page.
|
Butcher Pete |
Oct 08, 2015 |
Printed |
Page 105
second command of the page |
The original command specified returns an error,
<data/immigration.csv csvcut -c Period,Denmark,Belgium,Netherlands,Norway,Sweden|Rio -re 'melt(df, id="Period", variable.name="Country",value.name="Count")'|tee data/immigration-long.csv|head|csvlook
returning
Loading required package: tidyr
Error: could not find function "melt"
Execution halted
As melt is indeed included in the reshape2 library, once installed, the fix is to load the library in the command, as:
<data/immigration.csv csvcut -c Period,Denmark,Belgium,Netherlands,Norway,Sweden|Rio -re 'library(reshape2);melt(df, id="Period", variable.name="Country",value.name="Count")'|tee data/immigration-long.csv|head|csvlook
|
Nelson Gaasch |
Feb 06, 2016 |