Errata

Baseball Hacks

Errata for Baseball Hacks

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Other Digital Version Code runs with errors
Script given on 199

I am getting the following error upon running the get_data.pl script.


C:\baseball>get_data.pl KAN
Can't call method "rows" on an undefined value at C:\baseball\get_data.pl line 24.

C:\baseball>

Here is the code I am using:

#!/usr/bin/perl

# PERL MODULES TO USE
use LWP::Simple;
use HTML::TableExtract;

# WHAT TEAM TO PULL?
$TeamID = $ARGV[0];

# CREATE FILE TO PLACE OUTPUT INTO
$outfile = 'data.txt';
open OUT,">$outfile" or die "can't open file $outfile for output!\n";

# GRAB HTML OF ESPN WEBPAGE FOR GIVEN TEAM
$URL = "http://sports.espn.go.com/mlb/teams/batting?team=" . $TeamID;
$html = get($URL);

# PARSE HTML INTO NEW TABLEEXTRACT OBJECT
$te = new HTML::TableExtract();
$te->parse($html);

# WE'RE INTERESTED IN THE 2ND HTML TABLE IN THE PAGE
$ts = $te->table_state(0,1);
@rows = $ts->rows;

# HOW MANY HTML TABLE ROWS?
$N = scalar(@rows);

# PRINT OUT THE COLUMN HEADERS
print OUT "TEAM|" . join("|", @{$rows[1]}) . "
";

# FOR REST OF ROWS, PIPE-DELIMIT DATA PLUS A LINEFEED
for $i (2 .. $N-4) {
print OUT "$TeamID|";
print OUT join("|", @{$rows[$i]});
print OUT "
";
}

# CLOSE OUTPUT FILE
close OUT;

Matt Hoffman  Jun 15, 2009 
Printed Page 24
Bottom of the page

In the baserunning info '.1-H;2-H;3-H', the order seems to be reversed.

Instead, it should be '.3-H;2-H;1-H'. This would match the REGEXP example
on p.113

Anonymous   
Printed Page 38
At Bat

The definition of At Bat on page 38 of Baseball Hacks is missing the fact that a sacrifice bunt (as distinguished from a sacrifice fly) does not result in an at bat. (Rule 10.02(a)(1))

Anonymous  Sep 29, 2014 
Printed Page 51
Step 5

Before you can "show tables;" the reader must first:
mysql> SELECT bbdatabank; <cr>

Anonymous   
Printed Page 65
3

The line of code is given as:

perl -e "print "hello world!
";"

This code works for ActivePerl, but not for Cygwin, advocated on the following page.
Cygwin apparently wants the exclamation point to be escaped. The following code will
work for both versions:

perl -e "print "hello world!
";"

Anonymous   
Printed Page 65
bottom of page

the HTML-TableContentParser module requires a thousand dollar subscription to ActiveState Business Edition?

Chris Zimmer  Aug 21, 2014 
Printed Page 85
SQL INSERT command

The last four values of the INSERT command are inserting 3-letter team codes into the
table. In the prototype table above, those codes are defined as being two letters
(CHAR(2) datatype). Also, in the printed table those teams are defined as CO, HO, AT
and PH. When the INSERT command is run, it will simply truncate those team names,
causing no real harm to the example. However, mySQL will return those 4 warnings.

To clean up the code, simply remove the last letter from the team column on those
last 4 values. That will remove the warnings and make the command match the table
definition exactly.

Anonymous   
Printed Page 94
Subqueries in Access

A couple mistakes here:

In the first SELECT statement:
In Access 2003, at least, the expression (H + 2B + 2 * 3B + 3 * HR) must be written
as (H + [2B] + 2 * [3B] + 3 * HR) which for consistency might as well be ([H] + [2B]
+ 2 * [3B] + 3 * [HR]).
Next, the field AB appears twice in the list; it only needs to be there once.

In the second SELECT statement:
The nickname t is used in the WHERE clause, but is not defined. The clause 'from
slugging_inner_query' should read 'from slugging_inner_query t'.

Finally, a semicolon ends the second SELECT statement, but not the first. This
doesn't matter as Access will simply add in the semicolon anyway. However, for
consistency, these should match.

Anonymous   
Printed Page 106
bullet 2

The command, "cat GL2003.TXT >> GL2003.HDR.TXT" returns an error, unknown command. Perhaps there is new coding for this command since the book was published?

Marianne Pelletier  Jun 02, 2013 
Printed Page 109
line 20 of 'rosters.pl' code

On line 20 of the rosters.pl code (which also appears in the hack_22_rosters.pl code
that can be downloaded from the web site), the regular expression for verifying the
Retrosheet ID is incorrect. The code reads:

if (/[a-z]{5}d{3}/) {

Retrosheet ID's can include a dash when a player's last name is shorter than four
characters (like Derrek Lee, lee-d002). The line should read:

if (/[a-z-]{5}d{3}/) {

The dash in the first character class will match players like Lee or Jason Bay, etc.

Anonymous   
Printed Page 109
rosters.pl

In line 5 of the code,

print "retroID,lastName,firstName,bats,throws,team,pos
";

should read:

print "year,retroID,lastName,firstName,bats,throws,team,pos
";

Otherwise, all of the columns will be shifted off by one (and position information
will not make it into the database).

Anonymous   
Printed Page 109
Paragraph starting "Notice that..."

It says that you called the output files pbp.csb and pbp2k.csv, but the script
outputs pbp1960-1992.csv and pbp2000-2004.csv. It's trivial to rename the files,
except that the name pbp.csv is used to dump the program output (which is not really
needed) as stated in the previous paragraph. So, in order to avoid confusion, either
the "debug" output of the script should be called something else, or the script
should just output the files with the names that will be used later.

Anonymous   
Printed Page 109
translate.pl code

The changes to the code in translate.pl for Windows (listed in the confirmed errata
list) are not enough to get this script working on Windows XP. I had to make the
following changes to get this code to work on Windows. (Note this fix uses the free
7-zip archive tool):

Change from this:
print `cat all_hdr.txt > $outfile`;
print `cat all_hdr.txt > $outfile2k`;

to this:
print `type all_hdr.txt > $outfile`;
print `type all_hdr.txt > $outfile2k`;

Change from this:
print `unzip -qq -o $archive`;

to this:
print `7z x -y $archive`;

Change from this:
print `./BEVENT.EXE -y $century$year -f 0-96 $file >> $outfile`;

to this:
print `BEVENT.EXE -y $century$year -f 0-96 $file >> $outfile`;

Change from this:
print `./BEVENT.EXE -y $century$year -f 0-96 $file >> $outfile2k`;

to this:
print `BEVENT.EXE -y $century$year -f 0-96 $file >> $outfile2k`;

Anonymous   
Printed Page 110
Creating a play-by-play database and tables

In the console commands to the MySQL server, there is a GRANT ALL command on the
database pbp. However, there is not creation of that pbp database. Readers can
follow earlier examples to do this, but the following code shown on the page will not
run as expected without this database already having been created.

Anonymous   
Printed Page 134
middle of code

the line
my $fday = length($mday) == 1 ? '0' . $mon : $mday;
should read
my $fday = length($mday) == 1 ? '0' . $mday : $mday;

to format the 'day of the month' portion of the URL that will fetch data from mlb.com

Anonymous  Jul 31, 2009 
Printed Page 141
Running the Hack

When running this on Windows, using ActivePerl, the DBD-mysql module package was not
yet installed.

In order to run this you will need to load the Perl Package Manager (as in Hack #12).
Then type "install DBD-mysql". This will download the DBI and DBD-mysql packages,
which will allow the perl to run and connect to the database.

(The DBD-mysql package should probably be included in Hack #12 on page 66 in the list
of hidden pacakges. I do not know if users of other operating systems will need to
separately download this as well.)

Anonymous   
Printed Page 141
last paragraph

'You should run this script exactly once per day...' implies running
the bootstrap script, load_db.pl, once a day as opposed to running the
update script, update_db.pl, once a day.

Anonymous   
Printed Page 146
last paragraph

This is not a mistake, but a technical update.

For the 2006 season, GameDay has replaced the players.txt file with the players.xml
file. Both files were available from Opening Day through 4/25/06, but as of 4/26/06
only the XML file exists.

Anonymous   
Printed Page 147
1st code comment, middle of the page

Contents of pbp can no longer be fetched. Pitch-by-pitch data has not been available
as of the 2006 All-Star game, 7/11/06. Cannot find pitch location information stored
anywhere, except for final batter of a game.

Anonymous   
Printed Page 159
SQL Select Query

All references to a field named playerID should instead be to idxLahman.

Anonymous   
Printed Page 160
Step 3

"Drop Row Fields Here" should read "Drop Category Fields Here" (at least in Excel
2003).

Also the field "playerID" is actually the field "idxLahman".

Anonymous   
Printed Page 184-185
Graphs and exposition

The shapes of the graphs match the 2003 data, but the teams labels are incorrect. So
while, the description of the teams in the exposition is accurate, the graphs do not
match. Also, the graphs use the abbreviations CHC and LAD for example while the
exposition uses CHN and LAN (as in the database). Correct graphs from the current
database will fix the problems.

Anonymous   
PDF Page 257
Bottom of page

In the formula...

`DER =(IPOuts– HRA – SOA) /(HA + E + I1POuts – HRA – SOA)`

...presumably `I1POuts` should also be `IPOuts` (as it is in the 'same' formula on the next page, P.258)

Mark Beveridge  Mar 12, 2020 
Printed Page 341
code sample

In R, the last line of the following code sample causes an error message stating "Error: object 'H99' not found". I triple-checked and it seems i am following the code example :



batting1999 <- subset(batting, batting$yearID == 1999 & batting$AB > 249, select = c("H", "AB", "playerID"))
batting2000 <- subset(batting, batting$yearID == 2000 & batting$AB > 249, select = c("H", "AB", "playerID"))
batting2001 <- subset(batting, batting$yearID == 2001 & batting$AB > 249, select = c("H", "AB", "playerID"))
batting2002 <- subset(batting, batting$yearID == 2002 & batting$AB > 249, select = c("H", "AB", "playerID"))

names(batting1999) <- c("H99", "AB99", "playerID")
names(batting2000) <- c("H00", "AB00", "playerID")
names(batting2001) <- c("H01", "AB01", "playerID")
names(batting2002) <- c("H02", "AB02", "playerID")

batting.test <- merge(merge(merge(batting1999, batting2000, by="playerID"), batting2001, by="playerID"), batting2002, by="playerID")

batting.test$sample.AVG <- (H99 + H00 + H01)/(AB99 + AB00 + AB01)

Anonymous  Jul 07, 2011