Skip to Main Content
Developing Feeds with RSS and Atom
book

Developing Feeds with RSS and Atom

by Ben Hammersley
April 2005
Intermediate to advanced content levelIntermediate to advanced
270 pages
7h 13m
English
O'Reilly Media, Inc.
Content preview from Developing Feeds with RSS and Atom

Using Regular Expressions

Using regular expressions to parse feeds may seem a little brutish, but it does have two advantages. First, it totally negates the issues regarding the differences between standards. Second, it is a much easier installation: it requires no XML parsing modules or any dependencies thereof.

Regular expressions, however, aren’t pretty. Consider Example 8-7, which is a section from Rael Dornfest’s lightweight RSS aggregator, Blagg.

Example 8-7. A section of code from Blagg
# Feed's title and link
my($f_title, $f_link) = ($rss =~ m#<title>(.*?)</title>.*?<link>(.*?)</link>#ms);

   
# RSS items' title, link, and description
   
while ( $rss =~ m{<item(?!s).*?>.*?(?:<title>(.*?)</title>.*?)?(?:<link>(.*?)</link>.

*?)?(?:<description>(.*?)</description>.*?)?</item>}mgis ) {
     my($i_title, $i_link, $i_desc, $i_fn) = ($1||'', $2||'', $3||'', undef);
   
     # Unescape &amp; &lt; &gt; to produce useful HTML
     my %unescape = ('&lt;'=>'<', '&gt;'=>'>', '&amp;'=>'&', '&quot;'=>'"');

     my $unescape_re = join '|' => keys %unescape;
     $i_title && $i_title =~ s/($unescape_re)/$unescape{$1}/g;
     $i_desc && $i_desc =~ s/($unescape_re)/$unescape{$1}/g;
   
     # If no title, use the first 50 non-markup characters of the description
     unless ($i_title) {
          $i_title = $i_desc;
          $i_title =~ s/<.*?>//msg;
          $i_title = substr($i_title, 0, 50);
          }
          next unless $i_title;

While this looks pretty nasty, it is actually an efficient way of stripping the data out of the RSS file, even if it is potentially much harder to extend. If ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

How to Build an RSS 2.0 Feed

How to Build an RSS 2.0 Feed

Mark Woodman
Secrets of RSS

Secrets of RSS

Steven Holzner

Publisher Resources

ISBN: 0596008813Supplemental ContentErrata Page