Working with Tabular Data (csv)

If you need to represent a data table in plain-text format, CSV (comma-separated value) files are about as simple as you can get. These files can easily be processed by almost any programming language, and Ruby is no exception. The csv standard library is fast for pure Ruby, internationalized, and downright pleasant to work with.

In the most simple cases, it’d be hard to make things easier. For example, say you had a CSV file (payments.csv) that looked like this:

name,payment
Gregory Brown,100
Joe Comfort,150
Jon Juraschka,200
Gregory Brown,75
Jon Juraschka,250
Jia Wu,25
Gregory Brown,50
Jia Wu,75

If you want to just slurp this into an array of arrays, it can’t be easier:

>> require "csv"
=> true
>> CSV.read("payments.csv")
=> [["name", "payment"], ["Gregory Brown", "100"], ["Joe Comfort", "150"],
    ["Jon Juraschka", "200"], ["Gregory Brown", "75"], ["Jon Juraschka", "250"],
    ["Jia Wu", "25"], ["Gregory Brown", "50"], ["Jia Wu", "75"]]

Of course, slurping files isn’t a good idea if you want to handle only a subset of data, but csv makes row-by-row handling easy. Here’s an example of how you’d capture only my records:

>> data = []
=> []
>> CSV.foreach("payments.csv") { |row| data << row if row[0] == "Gregory Brown" }
=> nil
>> data
=> [["Gregory Brown", "100"], ["Gregory Brown", "75"], ["Gregory Brown", "50"]]

A common convention is to have the first row of a CSV file represent header data. csv can give you nicer accessors in this case:

>> data => [] >> CSV.foreach("payments.csv", ...

Get Ruby Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.