Read the hit location files from MLB.com.
Starting in 2004, MLB.com began to share information on Gameday about every ball put into play: the batter, the pitcher, the spot from which the ball was fielded, the type of hit or out (single, double, triple, home run, ground-out, fly out), the inning, and the coordinates where the ball landed. You can learn how to get files from MLB.com Gameday in “Get Recent Play-by-Play Data” [Hack #28] .
This isn’t complete data about matchups between batters and pitchers (it doesn’t tell you about strikeouts or walks), but the information it includes is very cool. It tells you where hitters tend to hit balls and where balls are hit against pitchers, and it can tell you the ground ball/fly ball ratio. Basically, it’s a big list of the (X,Y) coordinates of where the ball landed inside a grid, with the top-left corner of the field having the coordinates (0,0) and the lower right having the coordinates (250,250), as shown in Figure 3-10. You might want to use these to create a spray chart [Hack #37] that shows where players tend to hit against a pitcher, or to develop your own statistics.
This hack explains the file format and shows you a simple script to import this data into a database.
Like most of the other scripts in this book, this script loops through a directory of files, loading each one individually, reformatting the contents, and saving the output to a comma-delimited text file. Because this is a short example, ...