book

Baseball Hacks

Name: Baseball Hacks
Author: Joseph Adler
ISBN: 9780596009427

by Joseph Adler

January 2006

Beginner

467 pages

14h 21m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Baseball Hacks
A Note Regarding Supplemental Files
Credits
About the Author
Contributors
Acknowledgments
Preface
Why Baseball Hacks?
How to Use This Book
How This Book Is Organized

Conventions Used in This Book
Using Code Examples
How to Contact Us
Got a Hack?
Safari® Enabled
1. Basics of Baseball
Hacks 1–7: Introduction
Baseball 101
Score a Baseball Game
Traditional ScoringRecord starting players’ names.Record plays during the game.Record substitutions.Record other information.Hacking the HackSee Also
Make a Box Score from a Score Sheet
The Official Rules for ScoringCalculating a Box Score from a Score SheetStep 1: Draw columns for names, at bats, runs, hits, and anything else.Step 2: Copy the batters’ names.Step 3: Count statistics for each batter.Step 4: Count statistics for each pitcher.Step 5: Prove the box score.Hacking the Hack
Keep Score, Project Scoresheet–Style
The Contents of a Play CodePlay code structure.Fielding.Type of play.Description.Base running.Example play codes.Pitch codes.
Follow Pitches During a Game
Following the Pitching StrategySet up a pitch outside.Follow breaking balls with a fastball.Follow fastballs with a breaking ball.Always throw the same impossible-to-hit pitch.Move the player off the plate.Identifying PitchesStep 1: Watch the umpire for location.Step 2: Watch the catcher for location.Step 3: Look at pitch speeds to determine pitch type.Step 4: Watch what the ball does at the plate.Step 5: Watch the pitcher react to the catcher’s signals.Step 6: Watch the catcher’s signals.
Follow the Game Online
Player StatisticsIndependent Commentary (Including Blogs)Hacking the Hack
Add Baseball Searches to Firefox
Adding Search Engines to FirefoxRunning the HackHacking the Hack
Find Images of Stadiums
Better Pictures and Distances with Google Earth
2. Baseball Games from Past Years
Hacks 8–23: Introduction
Get and Install MySQL
Installation on WindowsStep 0: Buy and install a software or hardware firewall.Step 1: Download and unpack the installer.Step 2: Run the Installation wizard.Step 3: Run the Configuration wizard.Testing the InstallationHacking the HackSee Also
Get an Access Database of Player and Team Statistics
A Player and Team Statistics Database for Microsoft AccessStep 1: Download the file.Step 2: Decompress and save the file.Step 3: Open the database file.Step 4: Test the database.The Contents of the Database
Get a MySQL Database of Player and Team Statistics
Step 1: Download the FileStep 2: Decompress the FileStep 3: Create the DatabaseStep 4: Import the DatabaseStep 5: Check That Everything Is ThereThe Contents of the DatabaseHacking the HackAnnual updates.Getting baseball statistics as text files.
Make Your Own Stats Book
Write the QueriesStep 1: Create “batters who played in 2004” query.Step 2: Create “fielding by games” query.Step 3: Create “fielding by most frequent position” query.Step 4: Create “team names” query.Step 5: Create “batting plus” query.Build the ReportHacking the Hack
Get Perl
Getting and Installing PerlInstall the Perl Modules Required in This BookHacking the HackStep 1: Download the Cygwin installer.Step 2: Run the Cygwin installer.Step 3: Configure Cygwin.
Learn Perl
The BasicsStatements.Variables.Datatypes.Control structures.Comments.An Example ProgramSome Not-so-Basic BasicsPattern matching through regular expressions.Subroutines.Modules and packages.EditorsHacking the Hack
Get Historical Play-by-Play Data
Retrosheet Event FilesThe CodeRunning the HackSee Also
Make Box Scores or Database Tables from Play-by-Play Data with Retrosheet Tools
Running the ToolsPreprocessing event files with BEVENT.ChadwickSee Also
Use SQL to Explore Game Data
Talking to Your DatabaseTablesQueriesJoins.Aggregates.Subqueries.Saving results in tables.Deleting tables.Running ScriptsGetting More Information and Help
Use Microsoft Access to Run SQL Queries
SQL Queries in AccessChanging SQL Queries to Graphical QueriesSubqueries in Access
Get a GUI for MySQL
Other ToolsMySQL Administrator.Tora and Toad.Aqua Data Studio.
Move Data from a Database to Excel
Select the Right Data for a SpreadsheetRunning the HackMoving data from Access to Excel.Moving data from MySQL Query Browser to Excel.Moving data from MySQL to Excel.Hacking the Hack
Load Baseball Data into MySQL
The CodeRunning the HackHacking the HackSee Also
Load Retrosheet Game Logs
The Code
Make a Historical Play-by-Play Database
The CodeFetching the data.Transforming the data.Creating a database import statement.Creating a play-by-play database and tables.
Use Regular Expressions to Identify Events
Hacking the HackSee Also
3. Stats from the Current Season
Hacks 24–29: Introduction
Use Microsoft Excel Web Queries to Get Stats
Web QueriesWeb Query Example: Up-to-Date Park FactorsStep 1: Find the data.Step 2: Running the queries.Step 3: Name stuff.Step 4: Create a results table.Hacking the Hack
Spider Baseball Sites for Data
The CodeRunning the HackHacking the HackSee Also
Discover How Live Score Applications Work
Use Your Router’s Content Filtering FeatureUse a Proxy ServerPacket Filters
Keep Your Stats Database Up-to-Date
The CodeCreate the box score database.The update script.The bootstrapping script.The helping code.Running the HackHacking the Hack
Get Recent Play-by-Play Data
The CodeThe spider script.The parser script.Running the HackHacking the HackSee Also
Find Data on Hit Locations
The CodeRunning the HackHacking the Hack
4. Visualize Baseball Statistics
Hacks 30–39: Introduction
Plot Histograms in Excel
The CodeHacking the Hack
Get R and R Packages
Analyze Baseball with R
Calculations in RAssignment in RArraysData FramesCommentsFunctionsGraphics in RHacking the HackSee Also
Access Databases Directly from Excel or R
Use ODBC in RUse ODBC in ExcelHacking the Hack
Load Text Files into R
The CodeSee Also
Compare Teams and Players with Lattices
The CodeRunning the Hack
Compare Teams Using Chernoff Faces
The CodeRun the HackHacking the Hack
Plot Spray Charts
Step 1: Load the file into a data frame.Step 2: Set up the axes and the diamond.Step 3: Plot matchups.Batter Spray DiagramsHexagonal BinningStep 1: Get the hexbin package.Step 2: Load the hexbin package.Step 3: Plot the graph.Hacking the Hack
Chart Team Stats in Real Time
The CodeRunning the HackHacking the HackCreate a batch file to plow through several teams.Still more automation.Don’t export individual PNG images.
Slice and Dice Teams with Cubes
PrerequisitesThe CodeStep 1: Define local cube contents.Step 2: Create the local cube.Step 3: Create a local web application to interact with the cube.Running the HackHacking the Hack
5. Formulas
Hacks 40–59: Introduction
How I Chose the FormulasSummary Statistics for the FormulasUsing Formulas for Fantasy BaseballWho Came Up with These Things?
Measure Batting with Batting Average
Sample CodeBatting average formula.Running the HackSummary statistics.Top 10.Distribution.Box plot.
Measure Batting with On-Base Percentage
Sample CodeRunning the HackSummary statistics.Top 10.Distribution and box plot.
Measure Batting with SLG
The FormulaSample CodeRunning the HackSummary statistics.Top 10.Distribution and box plot.
Measure Batting with OPS
Running the HackSummary statistics.Top 10.Distribution and box plot.
Measure Power with ISO
Sample CodeSummary statistics.Top 10.Distribution and box plot.
Measure Batting with Runs Created
The FormulaSample CodeSummary statistics.Top 10.Histogram and box plot.
Measure Batting with Linear Weights
Estimating the Weights from an Expected Runs MatrixPalmer’s FormulaSample CodeRunning the HackSummary statistics.Top 10.Histogram and box plot.Hacking the HackEstimate weights with linear regression.
Measure Pitching with ERA
The CodeRunning the HackSummary statistics.Top 10.Distribution and box plot.
Measure Pitching with WHIP
Running the HackSummary statistics.Top 10.Distribution and box plot.
Measure Pitching with Linear Weights
Sample CodeRunning the HackSummary statistics.Top 10.Distribution and box plot.
Measure Defense with Defensive Efficiency
The FormulaSample CodeSummary statistics.Distribution in last 10 years (1994–2003).Distribution and box plot.
Measure Pitching with DIPS
The FormulaSample CodeLast year (2003).Last 10 years (1994–2003).Last 50 years (1955–2003).Lucky and Unlucky Players
Measure Base Running Through EqBR
Equivalent Batter RunsThe CodeSummary statistics.Top 10.Histogram.Hacking the Hack
Measure Fielding with Fielding Percentage
The FormulaSample Code
Measure Fielding with Range Factor
Sample CodeSummary statistics.Top 10.Histogram.Box plot.Hacking the Hack
Measure Fielding with Linear Weights
The FormulaCalculating Fielding RunsStep 1: Calculate league totals.Step 2: Calculate team totals.Step 3: Calculate expected values for each player.Step 4: Calculate fielding runs for each player.Sample CodeSummary statistics.Descriptive statistics.Top 10.Distribution and box plot.See Also
Measure Park Effects
ApproachesRequirements for a good park factor.MethodologySample CodeUsing Park FactorsPark factors for run-based measurements.Park factors for other statistics.Hacking the HackEnhancements.Compare individual offensive stats (singles, doubles, triples, home runs, etc.).Fancy statistical approaches.See Also
Calculate Fan Save Value
SavesSaves are subjective.Fan Save ValueHow does a fan save value compare to the standard save value?Sample CodeUsing the Fan Save Value Formula
Calculate Save Value
The FormulaUsing the Save ValueSample CodeUsing the Save Value Formula
Calculate Holds and Decent Holds for Relief Pitchers
What Is a Hold?Analysis of reliever statistics.Analysis of the hold statistic.The CodeDecent Hold
6. Sabermetric Thinking
Hacks 60–71: Introduction
Thinking About Baseball
Calculate Expected Runs
The CodeRunning the HackHacking the HackWhat actually happened when teams bunted?Is bunting ever a good strategy?See Also
Calculate an Expected Hits Matrix
The CodeRunning the HackHacking the HackStrikeouts and the count.Extra base hits and the count.Balls in play and the count.
Look for Evidence of Platoon Effects
Average Platoon EffectsSwitch HittersHacking the Hack
Significant Number of At Bats
Find the Distribution of At BatsStatistical SignificanceOBP, AVG, and accuracy.Testing the hypothesis.
Find “Clutch” Players
Identify Clutch PlayersMeasure Player Performance in Clutch SituationsCompare PlayersUnderstanding the ResultsThe CodeTop players in clutch situations.Significant clutch performances.
Calculate Expected Number of Wins
The Pythagorean Wins FormulaThe code.The Pythagenport FormulaThe Back-of-the-Envelope Method
Measure Hits by Pitch Count
The CodeRunning the HackHacking the Hack
OBP, SLG, and Scoring Runs
The Data and the CodeThe Results
Measure Skill Versus Luck
The Code
Odds of the Best Team Winning the World Series
Top 10 Bargain Outfielders
The CodePrerequisites.Define SQL query for getting raw data.Import data into R.Running the HackIdentify common attributes.Look at correlations.Identify possible explanations for correlations.Assign attribute scores.Group players based on similarity.Attach group membership to data set.Transform salary variable.Create linear regression model.Compare predicted versus actual salaries.Identify most-underpaid players.Identify most-overpaid players.Hacking the HackVary the numbers of factors and/or number of clusters.Look at different positions.Look at different years.Include more variables.
Fitting Game Scores to a Strength Model
The Data and the CodeStrength of ScheduleConclusion
7. The Bullpen
Hacks 72–75: Introduction
Start or Join a Fantasy League
The BasicsMethods of ranking teams.Methods of picking teams.Find or form a fantasy league.See Also
Draft Your Fantasy Team
The BasicsPick a closer.Pick an RBI man.Focus on AVG, not OBP.Draft Tips from an ExpertBe patient with your money.Perform the “mustard toss”.Find a bargain.Target your favorites.Know thine enemy.
Make a Scoreboard Widget
The CodeRunning the HackHacking the HackSee Also
Analyze Other Sports
PossessionsClock StrategySee Also
A. Where to Learn More Stuff
Baseball Books
Baseball Web Sites
Statistics and Data Mining Books
Databases and Computer Languages
B. Abbreviations
Index
About the Author
Colophon
Copyright

Content preview from Baseball Hacks

Spider Baseball Sites for Data

Sometimes the only way to get the data you want is to pull it directly from the source.

While I was writing this book, I came across the following request on the Retrosheet mailing list:

I’m going to be doing the Fans’ Scouting Report for a third

year, but this time, I want to do it during the year.

I’m looking to get the following information for 2005

for all players, as of the all-star break:

Team,playerID,player name,pos,innings

Anyone who can help, please send me a note offlist.

(playerid being whatever your data source is).

Thanks, Tom

Basically, Tom needed to pull just a subset of data from the MLB.com site. Grabbing data from web pages so that you can reuse it for other purposes is a common task—so much so that it has its own name: spidering. Spidering allows you to write programs that read a web page and pull out just the parts you want, while throwing out the rest.

Web pages are written in a language called HyperText Markup Language (HTML). They contain different tags that explain to your computer how to format the page. Here is a short sample file that shows how this works:

	<html>
	<head>
	<title>Baseball Sites</title>
	</head>
	<body>
	<h1> Baseball Web Sites </h1>
	This book describes many different baseball web sites. Here are a
	few of my favorites:<br>
	<a href="http://www.baseball1.com">The Baseball Archive</a><br>
	<a href="http://www.retrosheet.org">Retrosheet</a><br>
	<a href="http://www.mlb.com">MLB.com</a><br>
	</body>
	</html>

The <html> tags tell the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 0596009429Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Baseball Hacks

by Joseph Adler

Spider Baseball Sites for Data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.