RSS is an XML language for making lists of things, where each thing has a title, description, date and URL (or some combination of the above), and where the list itself can also have a title, URL, etc. The point of this is that some information source can provide an RSS document (a 'feed') describing itself, so people (or things) interested in it can quickly determine what its contents are; in particular, this is useful when the source changes over time, and people are interested in lots of sources, because they can use a tool called an 'aggregator' to monitor many RSS feeds at once. The classic use of RSS feeds is to describe news websites, but they are now de rigeur with Web Loggers, and are also exported by some wikis (ours is at RSS Recent Changes); they can even be exported from newsgroups, mailing lists and individual email accounts. Indeed, the diversity of the sources which can export RSS is one of its great strengths; you can have a single window on your computer which tells you what's happening on ten websites, three wikis, five newsgroups, a dozen blogs and your inbox.

There are actually two very similar formats going by the acronym of RSS; Really Simple Syndication and RDF? Site Summary. RSS was originally invented by Netscape (at version 0.90, soon revised to 0.91), but some developers felt it didn't make proper use of the power of XML, and so developed the much more complex and sophisticated version 1.0; in response, Dave Winer, who preferred the simpler approach, developed 0.92 and 0.94 (but not 0.93) as revisions of the 0.9 branch (basically by removing various bits of cruft from 0.91), and has subsequently developed version 2.0 as a further revision. This is important, because it means that whilst 1.0 is a successor to 0.91, it is not really a successor to 0.92, and that 2.0 is a successor to 0.92, not 1.0. Got that? (a general tutorial) (a brief history) The 0.9x branch: (RSS 0.91) (RSS 0.92) (RSS 2.0) The 1.0 branch: (the specification) (modules) (Dublin Core module) (syndication module) (content module) Version 2.0 is the most enjoyable to work with; the 1.0 branch is really irritatingly over the top with all its XML gubbins, and 0.9x lack some critical features (like dates on items).

There are various useful tools for working with RSS: (a viewer) (a rather good and versatile validator) (automagically finds feeds related to a given page) An 0.9x-style RSS feed is an XML document, having a root rss element with a version attribute set to 0.92 or whatever. Inside that is a single channel element, which has metadata and items. The metadata is some combination of: title (channel/website title), link (URL to the website), description (Does Exactly What It Says On The Tin), language (using standard codes, like en-uk; actually, the standard says en-gb, which is Bad And Right?, but i suggest en-ox), image (an icon for the channel; too complicated to go into here), managingEditor and webMaster (in the Bad And Wrong form " (Ferndando Oo)"), pubDate and lastBuildDate (in RFC822? format), docs (pointing to <>), textInput (specifying a simple query form for the feed) and some others. The content is a sequence of item elements, each of which can contain title, link and description elements, each of which is self-explanatory.

Here's a simple-ish RSS file:

<?xml version="1.0"?>
<rss version="0.92">
<description>A wiki for friends and members of OUSFG, mainly intended for discussion of SF and associated things.<description>
<managingEditor> (Tom Anderson)</managingEditor>
<webMaster> (Tom Anderson)</webMaster>
<pubDate>11 Nov 2002 1642 GMT</pubDate>
<lastBuildDate>11 Nov 2002 1642 GMT</lastBuildDate>
		<description>We can't really describe wiki pages. We could pull the first paragraph out, but that would be expensive, and not terribly functional.</description>
		<!-- hey, there's no date field for items! that sucks! -->

It is not entirely clear what the relative semantics of <pubDate> and <lastBuildDate> are, especially in the context of dynamic things like wikis and Web Logs; the spec says:

pubDate The publication date for the content in the channel. For example, the New York Times publishes on a daily basis, the publication date flips once every 24 hours. That's when the pubDate of the channel changes. lastBuildDate The last time the content of the channel changed.

So, the lastBuildDate actually has nothing to do with the last build date - it's the date the last change was made to the content - whilst the pubDate seems to have no meaning at all for entities that don't have all-or-nothing update schedules. Incidentally, Twic I uses the date of the last change to the content for the pubDate and the date at which the RSS feed was generated (and it's generated on the fly) for the lastBuildDate.

Generating RSS from unusual sources, and vice versa: RSS view of an IMAP?? mailbox (written in Python, homepage is on a wiki - you know it makes sense) RSS back to an mbox mail file IMAP?? view of an RSS file a whole blog about IMAP?? blogging export RSS through NNTP?? export a google search as RSS Now all we need is a way to export NNTP? through gopher, then we can use <> to browse weblogs on the web!

Category Geekery

Sun, 13 Feb 2005 17:21:47 GMT Front Page Recent Changes Message Of The Day