Lexical Parsing with Regular Expressions (strscan)

Although Ruby’s String object provides many powerful features that rely on regular expressions, it can be cumbersome to build any sort of parser with them. Most operations that you can do directly on strings work on the whole string at once, providing MatchData that can be used to index into the original content. This is great when a single pattern fits the bill, but when you want to consume some text in chunks, switching up strategies as needed along the way, things get a little more hairy. This is where the strscan library comes in.

When you require strscan, it provides a class called StringScanner. The underlying purpose of using this object is that it keeps track of where you are in the string as you consume parts of it via regex patterns. Just to clear up what this means, we can take a look at the example used in the RDoc:

s = StringScanner.new('This is an example string')
s.eos?               # -> false

p s.scan(/\w+/)      # -> "This"
p s.scan(/\w+/)      # -> nil
p s.scan(/\s+/)      # -> " "
p s.scan(/\s+/)      # -> nil
p s.scan(/\w+/)      # -> "is"
s.eos?               # -> false

p s.scan(/\s+/)      # -> " "
p s.scan(/\w+/)      # -> "an"
p s.scan(/\s+/)      # -> " "
p s.scan(/\w+/)      # -> "example"
p s.scan(/\s+/)      # -> " "
p s.scan(/\w+/)      # -> "string"
s.eos?               # -> true

p s.scan(/\s+/)      # -> nil
p s.scan(/\w+/)      # -> nil

From this simple example, it’s clear to see that the index is advanced only when a match is made. Once the end of the string is reached, there is nothing left to match. Although ...

Get Ruby Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.