book

Developing Feeds with RSS and Atom

Name: Developing Feeds with RSS and Atom
Author: Ben Hammersley
ISBN: 9780596008819

by Ben Hammersley

April 2005

Intermediate to advanced

270 pages

7h 13m

English

O'Reilly Media, Inc.

Read now

Unlock full access

A Note Regarding Supplemental Files
Preface
Audience
Assumptions This Book Makes
How This Book Is Organized
Conventions Used in This Book
Using Code Examples
Safari Enabled
Comments and Questions
Acknowledgments

1. Introduction
1.1. What Are RSS and Atom for?
1.2. A Short History of RSS and Atom
1.2.1. HotSauce: MCF and RDF1.2.2. Channel Definition Format1.2.3. RSS First Appears1.2.4. The Standards Evolve1.2.5. The First Fork1.2.6. The Second Fork1.2.7. Pie, Echo, Necho, Atom1.2.8. Today’s Scene
1.3. Why Syndicate Your Content?
1.4. Legal Implications
1.4.1. If You Are Scraped
2. Using Feeds
2.1. Web-Based Applications
2.1.1. Bloglines2.1.2. Kinja2.1.3. Rocketinfo RSS Reader
2.2. Desktop Applications
2.2.1. NetNewsWire2.2.2. FeedDemon2.2.3. NewsMonster
2.3. Other Cunning Techniques
2.3.1. Mobile Devices2.3.2. Email Clients2.3.3. Feed-Based Search Engines
2.4. Finding Feeds to Read
3. Feeds Without Programming
3.1. From Email
3.2. From a Search Engine
3.2.1. Google3.2.2. Google News3.2.3. Yahoo!
3.3. From Online Stores
4. RSS 2.0
4.1. Bringing Things Up to Date
4.2. The Basic Structure
4.2.1. Required Channel Subelements4.2.2. Optional Channel Subelements4.2.3. item Elements4.2.4. The Simplest Possible RSS 2.0 Feed
4.3. Producing RSS 2.0 with Blogging Tools
4.4. Introducing Modules
4.4.1. blogChannel Module4.4.2. Creative Commons Module4.4.3. Simple Semantic Resolution Module4.4.4. Trackback Module4.4.5. ICBM Module4.4.6. Yahoo!’s Media RSS Module
4.5. Creating RSS 2.0 Feeds
4.5.1. Creating RSS with Perl Using XML::RSS4.5.1.1. guid, Permalink or not4.5.1.2. Module support under XML::RSS4.5.2. Creating RSS 2.0 with PHP4.5.2.1. Caching and saving4.5.2.2. Dates4.5.2.3. Namespaced modules4.5.3. Creating RSS 2.0 with Ruby4.5.4. Serving RSS 2.0
5. RSS 1.0
5.1. Metadata in RSS 2.0
5.1.1. Using URIs in RSS
5.2. Resource Description Framework
5.2.1. Resources, PropertyTypes, and Properties5.2.2. Nodes and Arcs5.2.3. Fitting RDF to RSS
5.3. RDF in XML
5.3.1. The Root Element5.3.2. <element rdf:about="URI OF ELEMENT">5.3.3. <element rdf:resource="URI” />5.3.4. RDF Containers5.3.4.1. rdf:Bag5.3.4.2. rdf:Seq5.3.4.3. rdf:Alt
5.4. Introducing RSS 1.0
5.4.1. Walking Through an RSS 1.0 Document
5.5. The Specification in Detail
5.5.1. The Basic Structure5.5.2. The Root Element5.5.3. <channel rdf:about=""> (a Subelement of rdf:RDF)5.5.3.1. Required subelements of channel5.5.4. <image rdf:resource=""> (a Subelement of rdf:RDF)5.5.5. <textinput rdf:about=""> (a Subelement of rdf:RDF)5.5.6. <item rdf:about=""> (a Subelement of rdf:RDF)5.5.7. The Simplest Possible RSS 1.0 Feed
5.6. Creating RSS 1.0 Feeds
5.6.1. Creating RSS 1.0 with Perl5.6.2. Producing RSS 1.0 with PHP
6. RSS 1.0 Modules
6.1. Module Status
6.2. Support for Modules in Common Applications
6.3. Other RSS 1.0 Modules
7. The Atom Syndication Format
7.1. Introducing Atom
7.1.1. The Structure of an Atom Feed7.1.1.1. The Atom entry7.1.1.2. Combining entries to make a feed7.1.2. The Reusable Syntax of Constructs
7.2. The Atom Entry Document in Detail
7.2.1. The Elements of Atom Entry7.2.2. The Atom Feed Document in Detail7.2.3. The Simplest Possible Thing That Will Actually Work
7.3. Producing Atom Feeds
7.3.1. Validating Atom Feeds
8. Parsing and Using Feeds
8.1. Important Issues
8.1.1. Converting Atom to RSS
8.2. JavaScript Display Parsers
8.2.1. RSS XPress8.2.2. Other Examples to Try
8.3. Parsing for Programming
8.3.1. PHP: MagpieRSS8.3.1.1. Using MagpieRSS8.3.2. Python: The Universal Feed Parser8.3.2.1. A complete aggregator in 40 lines8.3.3. Perl: XML::Simple8.3.3.1. Parsing RSS as simply as possible
8.4. Using Regular Expressions
8.5. Using XSLT
8.6. Client-Side Inclusion
8.7. Server-Side Inclusion
8.7.1. Enabling Server-Side Includes Within Apache 1.3.x8.7.2. Server-Side Includes with Microsoft IIS
9. Feeds in the Wild
9.1. Once You Have Created Your Simple RSS Feed
9.1.1. Publish a Link9.1.2. Enabling Autodiscovery9.1.3. Serving a Feed Correctly9.1.3.1. MIME types9.1.3.2. HTTP 1.19.1.3.2.1. Compression9.1.3.2.2. Conditional GET9.1.3.3. RSScache.com9.1.4. Registering with Aggregators9.1.5. Metadata for the Main Page9.1.6. Counting Hits and Clickthroughs
9.2. Publish and Subscribe
9.2.1. Publish and Subscribe Within RSS 2.09.2.2. Publish and Subscribe with RSS 1.0
9.3. Rolling Your Own: LinkPimp PubSub
9.4. LinkpimpClient.pl
9.4.1. LinkpimpListener.pl
10. Unconventional Feeds
10.1. Apache Logfiles
10.1.1. Walking Through the Code10.1.2. The Entire Listing
10.2. Code TODOs to RSS
10.2.1. Walking Through the Code10.2.2. The Entire Listing
10.3. Daily Doonesbury
10.3.1. Walking Through the Code10.3.2. The Entire Listing
10.4. Amazon.com Wishlist to RSS
10.4.1. Walking Through the Code10.4.2. The Entire Listing
10.5. FedEx Parcel Tracker
10.5.1. Walking Through the Code10.5.2. The Entire Listing
10.6. Google to RSS with SOAP
10.6.1. Walking Through the Code10.6.2. The Entire Listing
10.7. Last-Modified Files
10.7.1. Walking Through the Code10.7.2. The Entire Listing
10.8. Installed Perl Modules
10.8.1. Walking Through the Code10.8.2. The Entire Listing
10.9. The W3C Validator to RSS
10.9.1. Walking Through the Code10.9.2. The Entire Listing
10.10. Game Statistics to Excel
10.11. Feeds by SMS
10.12. Podcasting Weather Forecasts
10.12.1. How to Use It10.12.2. The Code Itself
10.13. Having Amazon Produce Its Own RSS Feeds
10.14. Cross-Poster for Movable Type
10.14.1. Walking Through the Code10.14.2. The Entire Listing
11. Developing New Modules
11.1. Namespaces and Modules Within RSS 2.0 and Atom
11.1.1. Differences from RSS 1.0
11.2. Case Study: mod_Book
11.2.1. What Do We Know?11.2.2. Can We Express This Data Already?11.2.3. Putting the New Elements to Work with RSS 2.011.2.4. Putting the New Elements to Work with RSS 1.011.2.5. Documentation
11.3. Extending Your Desktop Reader
11.4. Introducing AmphetaDesk
11.4.1. Installing AmphetaDesk11.4.2. index.html
A. The XML You Need for RSS
A.1. What Is XML?
A.2. Anatomy of an XML Document
A.2.1. Elements and AttributesA.2.2. Name SyntaxA.2.3. Well-FormednessA.2.4. CommentsA.2.5. Entity ReferencesA.2.6. Character ReferencesA.2.7. Character EncodingsA.2.7.1. Unicode encoding schemesA.2.7.2. Other character encodingsA.2.8. ValidityA.2.8.1. Document type definitions (DTDs)A.2.9. Putting It TogetherA.2.10. XML Namespaces
A.3. Tools for Processing XML
A.3.1. Selecting a ParserA.3.2. XSLT Processors
B. Useful Sites and Software
B.1. Uber Resources
B.2. Specification Documents
B.3. Mailing Lists
B.4. Validators
B.5. Desktop Readers
Index
About the Author
Colophon
Copyright

Content preview from Developing Feeds with RSS and Atom

Using Regular Expressions

Using regular expressions to parse feeds may seem a little brutish, but it does have two advantages. First, it totally negates the issues regarding the differences between standards. Second, it is a much easier installation: it requires no XML parsing modules or any dependencies thereof.

Regular expressions, however, aren’t pretty. Consider Example 8-7, which is a section from Rael Dornfest’s lightweight RSS aggregator, Blagg.

Example 8-7. A section of code from Blagg

# Feed's title and link
my($f_title, $f_link) = ($rss =~ m#<title>(.*?)</title>.*?<link>(.*?)</link>#ms);

   
# RSS items' title, link, and description
   
while ( $rss =~ m{<item(?!s).*?>.*?(?:<title>(.*?)</title>.*?)?(?:<link>(.*?)</link>.

*?)?(?:<description>(.*?)</description>.*?)?</item>}mgis ) {
     my($i_title, $i_link, $i_desc, $i_fn) = ($1||'', $2||'', $3||'', undef);
   
     # Unescape &amp; &lt; &gt; to produce useful HTML
     my %unescape = ('&lt;'=>'<', '&gt;'=>'>', '&amp;'=>'&', '&quot;'=>'"');

     my $unescape_re = join '|' => keys %unescape;
     $i_title && $i_title =~ s/($unescape_re)/$unescape{$1}/g;
     $i_desc && $i_desc =~ s/($unescape_re)/$unescape{$1}/g;
   
     # If no title, use the first 50 non-markup characters of the description
     unless ($i_title) {
          $i_title = $i_desc;
          $i_title =~ s/<.*?>//msg;
          $i_title = substr($i_title, 0, 50);
          }
          next unless $i_title;

While this looks pretty nasty, it is actually an efficient way of stripping the data out of the RSS file, even if it is potentially much harder to extend. If ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 0596008813Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Developing Feeds with RSS and Atom

by Ben Hammersley

Using Regular Expressions

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.