O'Reilly logo

Ruby by Example by Kevin C. Baird

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

#32 Extracting Text from XML (xml_text_extractor.rb)

Counting occurrences of tags is fine, but XML is designed to hold text wrapped in tags, providing some organization beyond what’s available simply from the content. That said, though, sometimes having just the text content is handy. When I was preparing a document using DocBook, I found myself wanting to use a spell checker on it. There are spell checkers that are XML-aware, but another approach would be to run a text extractor on XML and pass that output into a spell checker that expects plain text. This xml_text_extractor.rb is just such a script.

The Code

 #!/usr/bin/env ruby # xml_text_extractor.rb ❶ CHOMP_TAG = lambda { |tag| tag.to_s.chomp } =begin rdoc This script uses the Rexml parser, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required