Text-Processing Strategies
Ruby makes basic I/O operations dead simple, but this doesn’t mean it’s a bad idea to pick up and apply some general approaches to text processing. Here we’ll talk about two techniques that most programmers doing file processing will want to know about, and you’ll see what they look like in Ruby.
Advanced Line Processing
The case study for this chapter showed the most common use of
File.foreach(), but there is more to
be said about this approach. This section will highlight a couple of
tricks worth knowing about when doing line-by-line processing.
Using Enumerator
The following example shows code that extracts and sums the totals found in a file that has entries similar to these:
some lines of text total: 12 other lines of text total: 16 more text total: 3
The following code shows how to do this without loading the whole file into memory:
sum = 0
File.foreach("data.txt") { |line| sum += line[/total: (\d+)/,1].to_f }Here, we are using File.foreach as a direct iterator, and
building up our sum as we go. However, because foreach() returns an Enumerator, we can actually write this in a
cleaner way without sacrificing efficiency:
enum = File.foreach("data.txt")
sum = enum.inject(0) { |s,r| s + r[/total: (\d+)/,1].to_f }The primary difference between the two approaches is that when
you use File.foreach directly with a block, you
are simply iterating line by line over the file, whereas Enumerator gives you some more powerful ways
of processing your data.
When we work ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access