Freestyle Hellraising

XML sucks. Two other things that suck are duplication of content (OnceAndOnlyOnce!) and putting presentation information into what should be pure data. These two things suck a lot more than XML. What i am about to outline is a way of using XML to avoid duplication and contamination, and thus of minimising the total suckage.

Firstly, the good news: write everything in pure, presentation-free XML. Use XHTML if you must, for a homepage or a navigation page or something, but for other things, there may be a better format. RSS for a blog, DocBook for a book, etc. Don't include any nonstandard formatting elements; don't even include any class IDs. Nice, clean, XML. It's still XML, of course, rather than something nice like S-expressions, but it's better than presentation-contaminated XML.

Now, at this point, it would be nice to just tack a CSS stylesheet on it to make it look good. For some kinds of document, this may even be possible, and if it, it's the best way to go. However, it's often the case that CSS doesn't have the power to do the styling you want. For example, look at RSS (2.0, for the sake of argument); each item element has sub-elements for title, link, description and date. With CSS, we can format these however we like, but we can't (as far as i know) make the title a link whose href is the link element, which is what we really want to do.

No, for this, we need to bring out the heavy artillery: XSLT. XSLT is rather more serious than CSS, and when i say 'rather', i mean it's Turing complete. XSLT takes a different approach to CSS: a CSS stylesheet is basically a set of instructions about the graphical properties of particular kinds of elements in the target file; things like "make all 'title' elements eighteen-point Times, centred"; an XSLT stylesheet is a specification (spelled 'program') for turning one kind of XML document into another. For example, you can write an XSLT stylesheet that turns RSS into DocBook, or DocBook into RSS, or (since XSLT is itself an XML language) XSLT into XHTML, or even RSS into different RSS. The point is, you can do things like rewriting RSS into a format where the titles are links, as mentioned above.

The funny thing about XSLT, though, is that it doesn't really have much to say about the visual things that CSS deals with; transforming RSS into what-have-you doesn't give you any control over fonts. There are two ways out of this. One is to transform the input into something called XSL-FO, which is an ultra-hairy XML language that is somewhat analogous to TeX. This is very powerful, but a bit experimental, and rather hard. The easy way out is to transform into XHTML (see - there had to be a use for it!), and then use CSS to style that. (XML + XSLT = XHTML) + CSS = looking good!

If this all sounds a bit far-fetched, Kevin Davis has prepared a small demonstration.

Now, it should be obvious how this avoids contamination, but i also referred to duplication at the beginning. This approach avoids duplication because you no longer need to have source and presentation versions of the same thing; the source version can be presented by transformation on the fly.

The only little niggle is that you still need to put an xml-stylesheet directive in the underlying XML document; it would be nice if webservers could do this automatically.

Anyway, the upshot of all this is that at some point i shall develop an XSLT/CSS stylesheet of doom with which i can make ordinary web browsers (for some quite extraordinary meaning of 'ordinary') able to read RSS files as smoothly as they read HTML. And the reasons? There are no reasons. Who needs reasons when you've got XML?

Postscript: XSLTXT ought to make life easier.