BUY THIS BOOK
Add to Cart

Print Book $39.95


Add to Cart

Print+PDF $51.94

Add to Cart

PDF $31.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint or License this content?


Developing Feeds with RSS and Atom
Developing Feeds with RSS and Atom

By Ben Hammersley
Book Price: $39.95 USD
£28.50 GBP
PDF Price: $31.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
"Data! Data! Data!" he cried impatiently.
Sir Arthur Conan Doyle, The Adventure of the Copper Beeches
In this chapter, I'll first talk about what RSS and Atom are for and then take a look at a little of their history. We then move on to the business cases for syndicating your own content and a discussion of the philosophy behind content syndication. The chapter finishes with a brief discussion of the legal issues surrounding the provision and use of syndication feeds.
The original, and still the most common, use for RSS and Atom is to provide a content syndication feed : a consistent, machine-readable file that allows web sites to share their content with other applications in a standard way. Originally, as shown in the next section, this was used to share data among web sites, but now it's most commonly used between a site and a desktop application called a reader.
Feeds can be anything from just headlines and links to stories to the entire content of the site, stripped of its layout and with metadata liberally applied. Content syndication allows users to experience a site on multiple devices and be notified of updates over a variety of services. It can range from a simple list of links sent from site to site to the beginnings of the Semantic Web.
However, feeds are starting to be used as content in their own right: people are building services that only output to a feed and don't actually have a "real" site at all. In later chapters of this book, we'll look at the cool things you can do with this, and build some of our own.
In the Developer's Bars of the world—those dark, sordid places filled with grizzled coders and their clans—a special corner is always reserved for the developers of content-syndication standards. There, weeping into their beer, you'll find the veterans of a long and difficult process. Most likely, they will have the Thousand Yard Stare of those who have seen more than they should. The standards you will read about in this book were not born fresh and innocent, of a streamlined process overseen by the Wise and Good. Rather, the following chapters have been dragged into the world and tempered through brawls, knife fights, and the occasional riot. What has survived, it is hoped, is hardy enough to prosper for the foreseeable future.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Are RSS and Atom for?
The original, and still the most common, use for RSS and Atom is to provide a content syndication feed : a consistent, machine-readable file that allows web sites to share their content with other applications in a standard way. Originally, as shown in the next section, this was used to share data among web sites, but now it's most commonly used between a site and a desktop application called a reader.
Feeds can be anything from just headlines and links to stories to the entire content of the site, stripped of its layout and with metadata liberally applied. Content syndication allows users to experience a site on multiple devices and be notified of updates over a variety of services. It can range from a simple list of links sent from site to site to the beginnings of the Semantic Web.
However, feeds are starting to be used as content in their own right: people are building services that only output to a feed and don't actually have a "real" site at all. In later chapters of this book, we'll look at the cool things you can do with this, and build some of our own.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Short History of RSS and Atom
In the Developer's Bars of the world—those dark, sordid places filled with grizzled coders and their clans—a special corner is always reserved for the developers of content-syndication standards. There, weeping into their beer, you'll find the veterans of a long and difficult process. Most likely, they will have the Thousand Yard Stare of those who have seen more than they should. The standards you will read about in this book were not born fresh and innocent, of a streamlined process overseen by the Wise and Good. Rather, the following chapters have been dragged into the world and tempered through brawls, knife fights, and the occasional riot. What has survived, it is hoped, is hardy enough to prosper for the foreseeable future.
To fully understand these wayward children, and to get the most out of them, it is necessary to understand the motivations behind the different standards and how they evolved into what they are today.
The deepest, darkest origins of the current versions of RSS began in 1995 with the work of Ramanathan V. Guha. Known to most simply by his surname, Guha developed a system called the Meta Content Framework (MCF). Rooted in the work of knowledge-representation systems such as CycL, KRL, and KIF, MCF's aim was to describe objects, their attributes, and the relationships between them.
MCF was an experimental research project funded by Apple, so it was pleasing for management that a great application came out of it: ProjectX, later renamed HotSauce. By late 1996, a few hundred sites were creating MCF files that described themselves, and Apple HotSauce allowed users to browse around these MCF representations in 3D. Documentation still exists on the Web for MCF and HotSauce. See http://www.eclectica-systems.co.uk/complex/hotsauce.php and Example 1-1 for more.
Example 1-1. An example of MCF
begin-headers:
MCFVersion: 0.95
name: "Eclectica"
end-headers:

unit: "tagging.mco" 
name: "Tagging and Acrobat Integration" 
default_genl_x: -109
default_genl_y: -65
typeOf: #"SubjectCategory"

unit: "http://www.nplum.demon.co.uk/temptin/temptin.htm" 
name: "TemptIn Information Management Template" 
genls_pos: ["tagging.mco" -85 -137]

unit: "http://www.nplum.demon.co.uk/temptin/tryout.htm" 
name: "Download Try-out Version" 
genls_pos: ["tagging.mco" -235 120]
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Why Syndicate Your Content?
The advantages of using other people's feeds are obvious, but what about supplying your own? There are at least nine reasons to do so:
  • It increases traffic to your site.
  • It builds brand awareness for your site.
  • It can help with search engine rankings.
  • It helps cement relationships within a community of sites.
  • It improves the site/user relationship.
  • With additional technologies, it allows others to give additional features to your service—update-notification via instant messaging, for example.
  • It makes the Internet an altogether richer place, pushing semantic technology along and encouraging reuse. Good things happen when you share your data.
  • It gives you a good excuse to play with some cool stuff.
  • By reducing the amount of screen-scraping of your site, it saves wasted bandwidth.
There you are: social, spiritual, and mercenary reasons to provide a feed for your site.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Legal Implications
The copyright implications for RSS feeds are quite simple. There are two choices for feed publishers, and these reflect on the user.
First, the publisher can decide that the feed must be licensed in some way. In this case, only authorized users can use the feed. It is good manners on the part of the publisher to make it as obvious as possible that this is the case—by providing a copyright notice in an XML comment, at least, and preferably by making it difficult for unauthorized users to get to the feed. Password protection is a reasonable minimum. Registering a pay-only feed with aggregators or allowing Google to see the feed is asking for trouble.
Second, and most commonly, the publisher can decide that the RSS feed is entirely free to use. In this case, it is only polite for the publishers of public RSS feeds to consider the feed entirely in the public domain—free to be used by anyone, for anything. This might sound a little radical to the average company vice president, but remember: there is nothing in the RSS feed that didn't, in some way, in the actual source information in the first place. It is rather futile to get upset that someone might not be using your headlines in the company-approved font, or committing a similar infraction; it's somewhat against the spirit of the exercise.
Screen-scraping a site to create a feed, by writing a script to read the site-specific layout, is a different matter. It has already been legally found, in U.S. courts at least (in the Ticketmaster versus Tickets.com case of October 1999 to March 2000), that linking to a page didn't in itself a breach of copyright. And you can argue, perhaps less convincingly, that reproducing headlines and excerpts from a site comes under fair-use guidelines for review purposes. However, it is extremely bad form to continue scraping a site if the site owner asks you to stop. Instead, try to evangelize RSS to the site owner and get him to start a proper feed.
Nevertheless, for private use, screen-scraping is a useful technique. In later chapters you'll see how running screen-scraping scripts on your local machine can produce extremely useful feed-based applications. Because these are entirely self-contained, there's no legal issue at all.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Using Feeds
I took a speed-reading course and read War and Peace in twenty minutes. It involves Russia.
Woody Allen
Before we get into the tricky business of producing, parsing, scripting, and extending our own RSS and Atom feeds, it makes sense to look at how they are consumed. In this chapter, therefore, we shall look at the various reader applications currently available for your pleasure.
The earliest, and still perhaps the most common, method of reading syndication feeds, the web-based application is a convenient way to stay up to date whereever you find yourself. It's especially good if you use more than one computer. In this section, when I talk about web-based applications, I mean applications hosted elsewhere, by other people. Applications that use your browser as the interface and sit on your local machine are in the next section.
Bloglines (http://www.bloglines.com) may not have been the first web-based aggregator, but it is certainly the most popular today (see Figure 2-1). It's free to use and very slick, offering email subscriptions, services for webloggers, and an interesting Application Programming Interface.
Figure 2-1: Bloglines.com
Kinja (http://www.kinja.com; see Figure 2-2) is slightly different from most RSS and Atom applications in that it doesn't mention either standard anywhere. It is specifically designed to require no knowledge of the rest of this book, and it's free and tremendously easy to use. It's also, in my opinion, marvelously good looking. It has fewer features than Bloglines, however, especially for bloggers.
Figure 2-2: Kinja.com
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Web-Based Applications
The earliest, and still perhaps the most common, method of reading syndication feeds, the web-based application is a convenient way to stay up to date whereever you find yourself. It's especially good if you use more than one computer. In this section, when I talk about web-based applications, I mean applications hosted elsewhere, by other people. Applications that use your browser as the interface and sit on your local machine are in the next section.
Bloglines (http://www.bloglines.com) may not have been the first web-based aggregator, but it is certainly the most popular today (see Figure 2-1). It's free to use and very slick, offering email subscriptions, services for webloggers, and an interesting Application Programming Interface.
Figure 2-1: Bloglines.com
Kinja (http://www.kinja.com; see Figure 2-2) is slightly different from most RSS and Atom applications in that it doesn't mention either standard anywhere. It is specifically designed to require no knowledge of the rest of this book, and it's free and tremendously easy to use. It's also, in my opinion, marvelously good looking. It has fewer features than Bloglines, however, especially for bloggers.
Figure 2-2: Kinja.com
Another competitor in this space, Rocketinfo's RSS Reader (see Figure 2-3) is a free advertorial application for the Rocketinfo range of enterprise titles. It's also not as fully featured as Bloglines, but it does have a three-pane interface many people prefer.
Figure 2-3: reader.rocketinfo.com
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Desktop Applications
If you prefer to run a dedicated application to read your RSS, you have lots of options.
Because of its beauty and utility, the leading feed-reading application on Apple OS X, NetNewsWire (http://ranchero.com/netnewswire/; see Figure 2-4) caused a stir when it was first released. Version 2 is even better and is my personal favorite. It's not free, but you can try out a 30-day demo.
Figure 2-4: NetNewsWire in action
The most popular feed application on Windows, FeedDemon (see Figure 2-5) is an accomplished three-pane display newsreader. It's not free, but there is a trial version. It even has a built-in web browser.
Figure 2-5: FeedDemon in action
Never has an application been so fittingly named. NewsMonster (http://www.newsmonster.org/) is an enormous application. It's cross-platform and runs on Windows, Mac OS X, and Linux, off the back of Mozilla 1.0 or better. It's a truly ambitious piece of work with a lot of features you won't find anywhere else—for example, reputation networks, where users can recommend feeds to each other, and so on. It's well worth a look (see Figure 2-6).
Figure 2-6: NewsMonster in action inside Mozilla 1.7 on OS X
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Other Cunning Techniques
The PC isn't the only way to access a feed. Due to the lightweight XML nature of RSS and Atom, many other devices and conduits can use the formats to deliver information.
PDAs, mobile phones, and the incessant merging of the two can't escape the power of RSS and Atom:
  • PocketPC (http://www.happyjackroad.com/AtomicDB/pocketpc/pocketRSS/pocketRSS.asp)
  • Hand/RSS (http://standalone.com/palmos/hand_rss/) is a nice, nonfree but with a 30-day trial, RSS feed for Palm devices.
  • mobilerss (http://www.mobilerss.net) isn't an application per se but a service for turning RSS feeds into HTML simple enough to read on any mobile device's browser. It's built on the MagpieRSS parser shown in Chapter 8.
  • The FeedBurner Mobile Feed Reader (http://www.feedburner.com/fb/a/mfr) comes from the same people who provide the FeedBurner service detailed in Chapter 9. It should run on any of the latest mobile devices compatible with the J2ME MIDP 2.0/CLDC1.0 platform.
If you'd rather get your RSS through your already convenient email software, you're not alone. A number of tools will make this easy:
  • IzyNews (http://izynews.com/de/default.aspx?) sets up RSS feeds as unread messages in an IMAP directory. It requires some server-side setup, but it's perfect for a corporate environment with locked-down desktop machines.
  • NewsGator Outlook Edition (http://www.newsgator.com/outlook.aspx) is an RSS reader extension for Microsoft Outlook. Many people swear by it, and it features synchronization with an online version for when you're away from your main machine.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding Feeds to Read
Identifying sites that make feeds available can be tricky. There is no standard place to publish a feed, nor is there any particular filename or path to look for one. Of course, there are various methods for sites to identify their feeds, but none are universal. Nevertheless, if you can't see an explicit link to a feed, here's a few things you can try:
  • Look for the traditional feed icon, the white writing on an orange background, usually reading "XML" (see Figure 2-7). There are variants on this theme, but they're all recognizable.
    Figure 2-7: The garden variety eggplant
  • View Source on the site's main page. If you see a line within the head section of the code that reads:
    <link rel="alternate" type="application/rss+xml" title="RSS" href= "http://www.example.org/
    rss.xml"/>
  • the href part is the URL you want. This is a called an Auto-Discovery link and is discussed in Chapter 9.
  • You can try the most common URLs. Look for index.xml, index.rdf, rss.xml, rss.php, index.rss, or index.atom; usually, one of these will work.
  • Look up the site in Syndic8 (http://www.syndic8.com). This directory, also covered in Chapter 9, has over 200,000 feeds listed.
If the site you're most keen on doesn't have a feed available, it helps to ask it for one. A lot of site authors just don't know how welcome it would be. Ask!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Feeds Without Programming
I do not take a single newspaper, nor read one a month, and I feel myself infinitely the happier for it.
Thomas Jefferson
Now that you're set up with your own aggregator or reader application, and before we get into the horrible business of the standards themselves, it's a good idea to start creating your own personal feeds. Feeds are much more than just the latest news and articles from regular web sites. As you will see in Chapter 10, you can push all sorts of data through them. Chapter 10, however, contains a lot of code you will need to run yourself. In this chapter, we'll use other people's services to produce some interesting and useful feeds.
You can use a feed to display all your announcement-only mailing lists; you can also use it as a disposable email address when you register with web sites and the like. This frees your inbox and protects your real email address from being sold to spammers.
There are two services that do this, and both are very reliable: MailBucket (http://www.mailbucket.org/) and Dodgeit (http://www.dodgeit.com/).
Both operate in the same way. You send mail to xxx@mailbucket.org or xxx@dodgeit.com, where xxx is your own chosen identity. There's no sign up, so you need to check that your chosen identity isn't already taken. This highlights one issue: your mail isn't private, so don't use it for things you don't want others to see. (You could use an incredibly unguessable identity to make such risks very unlikely.)
Once the mail starts to arrive into your inbox, it will look like Figure 3-1.
Figure 3-1: The Dodgeit.com inbox on the Web
You can then subscribe to the feed at either http://www.mailbucket.org/xxx.xml or http://www.dodgeit.com/run/rss?mailbox=xxx. You will then see something like Figure 3-2 in your reader application.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
From Email
You can use a feed to display all your announcement-only mailing lists; you can also use it as a disposable email address when you register with web sites and the like. This frees your inbox and protects your real email address from being sold to spammers.
There are two services that do this, and both are very reliable: MailBucket (http://www.mailbucket.org/) and Dodgeit (http://www.dodgeit.com/).
Both operate in the same way. You send mail to xxx@mailbucket.org or xxx@dodgeit.com, where xxx is your own chosen identity. There's no sign up, so you need to check that your chosen identity isn't already taken. This highlights one issue: your mail isn't private, so don't use it for things you don't want others to see. (You could use an incredibly unguessable identity to make such risks very unlikely.)
Once the mail starts to arrive into your inbox, it will look like Figure 3-1.
Figure 3-1: The Dodgeit.com inbox on the Web
You can then subscribe to the feed at either http://www.mailbucket.org/xxx.xml or http://www.dodgeit.com/run/rss?mailbox=xxx. You will then see something like Figure 3-2 in your reader application.
Figure 3-2: The Dodgeit.com inbox inside NetNewsWire
Personally speaking, I think these services are the cat's pajamas. There are many mailing lists that don't require the ability to reply, or to which you might not actually want to contribute, or whose traffic is so great, you might not want your email application to keep firing off new mail alerts. These services are perfect for that.
Gmail, Google's beta email product, also produces a feed of your inbox, but because it's in beta, it's hard to say if the feed will still be there by the time you read this.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
From a Search Engine
The current popular search engines have many great features, but they don't usually provide any form of feed for search results. Such a feed is extremely useful for people trying to keep track of specific search topics (for example, their name).
I personally host a Google-to-RSS service, which you are free to use. It's at http://www.benhammersley.com/tools/google_to_rss.html. To use it, simply add your search request to the end of the URL http://www.benhammersley.com/tools/googlerss.cgi?q=. For example:
http://www.benhammersley.com/tools/googlerss.cgi?q=ben%20hammersley
Now, subscribe to that URL in your newsreader.
Note that I'm running this service from my own Google API key, which has a limit of 1,000 queries a day. If you'd like to help out, you can get your own key from http://www.google.com/apis/ and use it with your own queries. Add it to the URL with a &k=123456789 attribute, like this:
http://www.benhammersley.com/tools/googlerss.cgi?q=ben%20hammersley&k=123456789
The source code for this service is discussed in Chapter 10.
Google News searches can also be turned into feeds via a service hosted by Julian Bond, found at http://www.voidstar.com/gnews2rss.php.
That page has a form to help generate the feed's URL, or you can make it up yourself with this pattern:
http://www.voidstar.com/gnews2rss.php?num=number_of_items&q=your_query
Note that this Google News service is for personal aggregators only and not for redisplay on another web site. The source code for this service is in Chapter 10.
Despite its tiresomely exclaiming name, Yahoo! goes one better than Google in that it provides feeds of its News Search results as standard. For example, go to http://search.news.yahoo.com/search/news/?c=&p=Conkers for news of the greatest autumnal sport, and look for the standard orange XML logo. It's impossible for me to give you a shortcut URL structure, however, because Yahoo! employs redirects.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
From Online Stores
E-commerce sites can make very good use of feeds. Customers wanting to subscribe to, say, an individual artist's discography can be alerted the instant that a new title is available. More of this sort of thing is dealt with in Chapter 10, but in the meantime, the following sites are available now:
Amazon.com
The Internet's biggest retailer has started to publish feeds itself, but they're not as configurable as those produced by the Lockergnome Amazon RSS Feed Generator, http://channels.lockergnome.com/rss/resources/amazon.phtml. It's a simple checkbox and submit page, and worth playing with, although it only supports searching Amazon.com at the moment.
There is also an Amazon.co.jp-to-RSS service at http://723.to/azrssmake.php.
iTunes Music Store
Apple's iTunes Music Store is extremely well-enabled for feeds. You can subscribe to feeds of new releases, top songs and albums, featured tracks, and so on for any combination of musical genres. Do this by visiting its RSS Generator at the marvellously memorable http://phobos.apple.com/WebObjects/MZSearch.woa/wa/MRSS/rssGenerator (see Figure 3-3).
Figure 3-3: The iTunes Music Store RSS Generator
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: RSS 2.0
A facility for quotation covers the absence of original thought.
Dorothy L. Sayers, Lord Peter Wimsey in Gaudy Night
This chapter describes the RSS 2.0 specification in detail, how it works, and how it is created. It also explores RSS 2.0 predecessors—the largely compatible 0.91 and 0.92 specifications—and how they relate and can be converted to the latest standard.
RSS 2.0 has a long history. As was shown in Chapter 1, it's based on a succession of specifications: RSS 0.91, 0.92, 0.93, and 0.94. Because of this history and because of a lack of any adequate documentation for many of these standards, there is a massive gulf between the quality of the document you can produce and the quality of what you might have to parse. In other words, many people are doing it wrong.
This confusion forces this chapter to address two different issues. The first is how to create a perfectly specification-compliant feed, and the second is how to deal with feeds produced by those with less exacting standards.
This decision brings us to another one: what to do about the older versions that led to 2.0? The answer is this: although many people are still learning to produce 0.91, 0.91, et al, we will not. You'll learn how to parse them, but from now on, as far as the simple strain of syndication feeds goes, we'll be creating only 2.0 feeds.
With that decided, steel yourself, visit the official specification document for RSS 2.0 at http://blogs.law.harvard.edu/tech/rss, and let's get on with it.
The top level of an RSS 2.0 document is the rss version="2.0" element. This is followed by a single channel element. The channel element contains the entire feed contents and all associated metadata.
There are 3 required and 16 optional subelements of channel within RSS 2.0. Here are the required subelements:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Bringing Things Up to Date
RSS 2.0 has a long history. As was shown in Chapter 1, it's based on a succession of specifications: RSS 0.91, 0.92, 0.93, and 0.94. Because of this history and because of a lack of any adequate documentation for many of these standards, there is a massive gulf between the quality of the document you can produce and the quality of what you might have to parse. In other words, many people are doing it wrong.
This confusion forces this chapter to address two different issues. The first is how to create a perfectly specification-compliant feed, and the second is how to deal with feeds produced by those with less exacting standards.
This decision brings us to another one: what to do about the older versions that led to 2.0? The answer is this: although many people are still learning to produce 0.91, 0.91, et al, we will not. You'll learn how to parse them, but from now on, as far as the simple strain of syndication feeds goes, we'll be creating only 2.0 feeds.
With that decided, steel yourself, visit the official specification document for RSS 2.0 at http://blogs.law.harvard.edu/tech/rss, and let's get on with it.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Basic Structure
The top level of an RSS 2.0 document is the rss version="2.0" element. This is followed by a single channel element. The channel element contains the entire feed contents and all associated metadata.
There are 3 required and 16 optional subelements of channel within RSS 2.0. Here are the required subelements:
title
The name of the feed. In most cases, this is the same name as the associated web site or service.
<title>RSS and Atom</title>
link
A URL pointing to the associated resource, usually a web site. The link must be an IANA-registered URI scheme, such as http://, https://, news://, or ftp://, though it isn't necessary for a application developer to support all these by default. The most common by a large margin is http://. For example:
<link>http://www.benhammersley.com</link>
description
Some words to describe your channel.
<description>This is a nice RSS 2.0 feed of an even nicer weblog</description>
Although it isn't explicitly stated in the specification, it is highly recommended that you do not put anything other than plain text in the channel/title or channel/description elements. There are some existing feeds with HTML within those elements, but these cause a considerable amount of wailing, and at least a small amount of gnashing of teeth. Do not do it. Use plain text only in these elements. The following sidebar, "Including HTML Within title or description," gives a fuller account of this, but in my opinion it's a bad idea.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Producing RSS 2.0 with Blogging Tools
The vast majority of RSS 2.0 feeds are produced by weblogging tools that use templates. The most popular of these is Movable Type, written by Ben and Mena Trott, which is freely available for personal use at http://www.movabletype.org. In order to discuss a few important implementation points, Example 4-3 shows a template for Movable Type that produces an RSS 2.0 feed.
Example 4-3. A Movable Type template for producing RSS 2.0
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title><$MTBlogName$></title>
<link><$MTBlogURL$></link>
<description><$MTBlogDescription$></description>
<language>en-gb</language>
<copyright>All content Public Domain</copyright> 
<managingEditor>ben@benhammersley.com</managingEditor> 
<webMaster>ben@benhammersley.com</webMaster>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<category  domain="http://www.dmoz.org">Reference/Libraries/Library_and_Information_
Science/Technical_Services/Cataloguing/Metadata/RDF/Applications/RSS/</category>
<generator>Movable Type/2.5</generator>
<lastBuildDate><$MTDate format="%a, %d %b %Y %I:%M:00 GMT"$></lastBuildDate>
<ttl>60</ttl>
   
<MTEntries lastn="15">
<item>
<title><$MTEntryTitle encode_html="1"$></title>
<description><$MTEntryExcerpt encode_html="1"$></description>
<link><$MTEntryLink$></link>
<comments><$MTEntryLink$></comments>
<author><$MTEntryAuthorEmail$></author>
<pubDate><$MTEntryDate format="%a, %d %b %Y %I:%M:00 GMT"$></pubDate>
<guid isPermaLink="false">GUID:<$MTEntryLink$></g<$MTEntryDate format=
"%a%d%b%Y%I:%M"$></guid>
</item>
</MTEntries>
</channel>
</rss>
The vast majority of this template is standard Movable Type fare. Taken from one of my own blogs, it uses the <$MT$> tags to insert information directly from the Movable Type database into the feed. So far, so simple.
Two things are worth close examination. First, the date format:
<pubDate><$MTEntryDate format="%a, %d %b %Y %I:%M:00 GMT"$></pubDate>
Care must be taken to ensure that the format of the contents of the date fields are correctly formed. RSS 2.0 feeds require their dates to be written to comply with RFC 822—for example:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introducing Modules
Modules are additional sets of elements, giving the feed a greater range of expression: they allow the specification to be extended without actually being changed, which is a very clever trick. You can make your own module match any data you might wish to syndicate. Admittedly, most aggregators will ignore it, but your own applications can take advantage of it. And, happily, the most popular modules are increasingly being supported by the latest aggregators as a matter of course.
Modules in RSS, both Versions 2.0 and 1.0, are created with a system known as XML Namespaces. Namespaces are the XML solution to the classic language problem of one word meaning two things in different contexts. Take "Windows," for example. In the context of houses, "windows" are holes in the wall through which we can look. In the context of computers, "Windows" is a trademark of the Microsoft Corporation and refers to its range of operating systems. The context within which the name has a particular meaning is called its namespace.
In XML, you can distinguish between the two meanings by assigning a namespace and placing the namespace's name in front of the element name, separated by a colon, like this:
<computing:windows>This is an operating system</computing:windows>

<building:windows>This is a hole in a wall</building:windows>
Namespaces solve two problems. First, they allow you to distinguish between different meanings for words that are spelled the same way, which means you can use words more than once for different meanings. Second, they allow you to group together words that are related to each other; for example, using a computer to look through an XML document for all elements with a certain namespace is easy.
Both RSS 1.0 and 2.0 use namespaces to allow for modularization . This modularization means that developers can add new features to RSS documents without changing the core specification.
Modularization has great advantages over the older RSS 0.9x's method for including new elements. For starters, anyone can create a module: there are no standards issues or any need for approval, aside from making sure that the namespace URI you use has not been used before. And, it means both RSS 1.0 and 2.0 are potentially far more powerful than RSS 0.9x ever was.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Creating RSS 2.0 Feeds
RSS 0.91 and 0.92 feeds are created in the same way; the additional elements found in 0.92 are well-handled by the existing RSS tools.
Of course, you can always hand-code your RSS feed. Doing so certainly gets you on top of the standard, but it's neither convenient, quick, nor recommended. Ordinarily, feeds are created by a small program in one of the scripting languages: Perl, PHP, Python, etc. Many CMSs already create RSS feeds automatically, but you may want to create a feed in another context. Hey, you might even write your own CMS!
There are various ways to create a feed, all of which are used in real life:
XML transformation
Running a transformation on an XML master document converts the relevant parts into RSS. This technique is used in Apache Axkit-based systems, for example.
Templates
You can substitute values within a RSS feed template. This technique is used within most weblogging platforms, for example.
An RSS-specific module or class within a scripting language
This method is used within hundreds of little ad hoc scripts across the Net, for example.
We'll look at all three methods, but let's start with the third, using an RSS-specific module. In this case, it's Perl's XML::RSS.
The XML::RSS module is one of the key tools in the Perl RSS world. It is built on top of XML::Parser—the basis for many Perl XML modules—and is object-oriented. Actually, XML::RSS also supports the creation of the older versions of RSS, plus RSS 1.0, and it can parse existing feeds, but in this section we will deal only with its 2.0 creation capabilities.
Incidentally, XML::RSS is an open source project. You can lend a hand, and grab the latest version, at http://sourceforge.net/projects/perl-rss
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: RSS 1.0
You see, I needed to go to hell. I was, you might say, homesick.
Nick Tosches, The Last Opium Den
Most of the feeds we've seen so far have been very simple. They provide little information beyond what is needed for the instant gratification of displaying the feed in a human-readable form. Of course, this isn't such a bad deal; many people only want to display the feeds as they come.
Others, however require a far richer set of feeds. For this, many people are using the RSS 1.0 flavor of the Resource Description Framework (RDF). In this chapter, we'll look at the metadata options RSS 2.0 provides and why you might want (or need) more. Then I'll give a basic overview of RDF and a thorough rundown of RSS 1.0 itself.
As all good tutorials on the subject will tell you, metadata is data about data. In the case of RSS 2.0, this includes the name of the author of the feed, the date the channel was last updated, and so on. In Example 5-1, the bold code is the metadata. You can remove this data, and the feed itself will still both parse and be useful when displayed as HTML. Like a Hitchcock cameo, the metadata is in the background, silent, but meaningful to those who can see it.
Example 5-1. The metadata within an RSS 2.0 feed
<rss version="2.0">
<channel>
  <title>RSS2.0 Example</title> 
  <link>http://www.oreilly.com/example/index.html</link> 
  <description>This is an example RSS2.0 feed</description> 
  <language>en-gb</language> 
               <copyright>Copyright 2004, Oreilly and Associates.</copyright>

               <managingEditor>editor@oreilly.com</managingEditor> 
               <webMaster>webmaster@oreilly.com</webMaster> 
               <pubDate>03 Apr 04 1500 GMT</pubDate>
               <lastBuildDate>03 Apr 04 1500 GMT</lastBuildDate>
               <docs>http://backend.userland.com/rss091</docs>
               <skipDays>
               <day>Monday</day>
               </skipDays>
               <skipHours>
               
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Metadata in RSS 2.0
As all good tutorials on the subject will tell you, metadata is data about data. In the case of RSS 2.0, this includes the name of the author of the feed, the date the channel was last updated, and so on. In Example 5-1, the bold code is the metadata. You can remove this data, and the feed itself will still both parse and be useful when displayed as HTML. Like a Hitchcock cameo, the metadata is in the background, silent, but meaningful to those who can see it.
Example 5-1. The metadata within an RSS 2.0 feed
<rss version="2.0">
<channel>
  <title>RSS2.0 Example</title> 
  <link>http://www.oreilly.com/example/index.html</link> 
  <description>This is an example RSS2.0 feed</description> 
  <language>en-gb</language> 
               <copyright>Copyright 2004, Oreilly and Associates.</copyright>

               <managingEditor>editor@oreilly.com</managingEditor> 
               <webMaster>webmaster@oreilly.com</webMaster> 
               <pubDate>03 Apr 04 1500 GMT</pubDate>
               <lastBuildDate>03 Apr 04 1500 GMT</lastBuildDate>
               <docs>http://backend.userland.com/rss091</docs>
               <skipDays>
               <day>Monday</day>
               </skipDays>
               <skipHours>
               <hour>20</hour>
               </skipHours>
               <cloud domain="http://www.oreilly.com" port="80" path=
"/RPC2" 
registerProcedure="pleaseNotify" protocol="XML-RPC" />

  <image>
    <title>RSS0.91 Example</title> 
    <url>http://www.oreilly.com/example/images/logo.gif</url> 
    <link>http://www.oreilly.com/example/index.html</link>
    <width>88</width> 
    <height>31</height> 
    <description>The World's Leading Technical Publisher</description>
  </image>
  <textInput>
    <title>Search</title>
    <description>Search the Archives</description>
    <name>query</name>
    <link>http://www.oreilly.com/example/search.cgi</link>
  </textInput>
   
  <item>
    <title>The First Item</title> 
    <link>http://www.oreilly.com/example/001.html</link> 
    <description>This is the first item.</description>
    
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Resource Description Framework
This system of defining everything with URIs, and using this to describe the relationships between things, has been formalized in a system known as the Resource Description Framework (RDF). In this section, we'll look at enough RDF to give you a head start on the rest of the book. For a much deeper insight into RDF, take a look at Practical RDF (O'Reilly).
Because RDF is quite abstract—its ability to be written in different ways notwithstanding—in this chapter, we are going to look at what the RDF developers call the "data model," which we can call "the really simple version, in pictures."
As before, within the data model, anything (an object, a person, a document, a concept, a section of a document, etc.) can have a URI. In RDF anything addressable with a URI is called a resource .
Some resources can be used as properties of other resources. For example, the concept of "Author" has a URI of its own (all concepts can), and other resources can have a property of "author." Such resources are called PropertyTypes .
A property is the combination of a resource, a PropertyType, and a value. For example, "The Author of RSS and Atom is Ben Hammersley." The value can be a string ("Ben Hammersley" in the previous example), or it can be another resource—for example, "Ben Hammersley (resource) has a home page (PropertyType) at http://www.benhammersley.com (resource)."
RDF's data model is most easily understood with diagrams, called RDF graphs, that show the relationships between resources, PropertyTypes, and properties. In these diagrams, the RDF world is split into nodes and arcs.
The resources and the values are the nodes, identified by their URIs. The PropertyTypes are the arcs, representing connections between nodes. The arcs themselves are also described by a URI.
Figure 5-1 is an RDF graph that shows the previous
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
RDF in XML
In preparation for the rest of this chapter, we need to look at how RDF is written in XML.
In all the examples in this book, I have given the RDF attributes a prefix of rdf:. This isn't necessary in many RDF documents, but it is the way they appear in RSS 1.0. For the sake of clarity, I will leave them in here too. Therefore, for reasons we will discuss later, the root element of an RDF document is:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
...
</rdf:RDF>
As you will see further on, the root element can also contain the URIs of additional RDF vocabularies. The following examples use elements from the RSS 1.0 vocabulary.
The rdf:about attribute defines the URI for the element that contains it. Remember, it is like the subject in a sentence: everything else refers to it. For example:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns="http://purl.org/rss/1.0/"
>
<channel rdf:about="http://www.example.org/">
...
</channel>
</rdf:RDF>
means the channel resource is identified by the URI http://www.example.org/. Or, more to the point, everything within the channel element is referred to by http://www.example.org.
The contents of the element then describe the object referred to by the URI:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
         xmlns="http://purl.org/rss/1.0/" >
<channel rdf:about="http://www.example.org">
<title>Sausages are tasty for breakfast</title>
<channel>
</rdf:RDF>
In this example, the resource channel identified by the URI http://www.example.org has a PropertyType title whose value is Sausages are tasty for breakfast. Nothing to object to there, then.
Remember, RDF describes the relationship between resources, their attributes, and other resources. You have to define all the resources, and the relationship PropertyTypes, before the RDF is valid and meaningful. The different objects are distinguished by unique URIs. So, every resource must have an
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introducing RSS 1.0
Content preview·Buy PDF of this chapter|