Lexical Parsing with Regular Expressions (strscan)
Although Ruby’s String
object provides many
powerful features that rely on regular expressions, it can be cumbersome
to build any sort of parser with them. Most operations that you can do
directly on strings work on the whole string at once, providing MatchData
that can be used
to index into the original content. This is great when a single pattern
fits the bill, but when you want to consume some text in chunks, switching
up strategies as needed along the way, things get a little more hairy.
This is where the strscan library comes in.
When you require strscan, it provides a class
called StringScanner
. The underlying
purpose of using this object is that it keeps track of where you are in
the string as you consume parts of it via regex patterns. Just to clear up
what this means, we can take a look at the example used in the
RDoc:
s = StringScanner.new('This is an example string') s.eos? # -> false p s.scan(/\w+/) # -> "This" p s.scan(/\w+/) # -> nil p s.scan(/\s+/) # -> " " p s.scan(/\s+/) # -> nil p s.scan(/\w+/) # -> "is" s.eos? # -> false p s.scan(/\s+/) # -> " " p s.scan(/\w+/) # -> "an" p s.scan(/\s+/) # -> " " p s.scan(/\w+/) # -> "example" p s.scan(/\s+/) # -> " " p s.scan(/\w+/) # -> "string" s.eos? # -> true p s.scan(/\s+/) # -> nil p s.scan(/\w+/) # -> nil
From this simple example, it’s clear to see that the index is advanced only when a match is made. Once the end of the string is reached, there is nothing left to match. Although ...
Get Ruby Best Practices now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.