Ben's web programming pages: XML & XSLT

Page index

XML
XSLT
Conclusion

XML

In order to make it easy for people to incoporate the M'Cheyne daily Bible readings into their own web pages I wrote a script which serves them up as an XML document. This can be interrogated from a remote site and the results are easily interpreted by Perl or PHP so that the readings can be extracted. Information on using it is on the M'Cheyne server page.

Any simple format would have done for the server output, but XML gives flexibility and future-proofing whilst being relatively easily interpreted by scripts.

The raw output of the server is a well-formed XML document. It contains a single root element <mcheyne> which itself contains strictly nested elements with balanced start and end tags. Subject to these constraints you are free in a well-formed XML document to choose pretty much any structure and element names you like.

<mcheyne>
<copyright>Copyright (c) 2003, Ben Edgington</copyright>
<version>2.0</version>
<day>8</day>
<date>
<mday>9</mday>
<month>January</month>
<year>2003</year>
</date>
...
</mcheyne>

Making the document well-formed would have been sufficient for currently envisioned uses of the server, but I was also keen to make it valid. This means that it should have a corresponding document type declaration which defines all the elements, and constrains the relationships between them.

I chose to use an external DTD so that automatic accesses of the M'Cheyne server by scripts do not need to download it. To point to the DTD, the XML must be declared with standalone="no", and the DOCTYPE declaration points to a file. It is a "SYSTEM" DTD as it is private to the server rather than publically available.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE mcheyne SYSTEM "server.dtd">

The document structure is then defined in the server.dtd using the somewhat arcane XML syntax. In this case the structure is fairly simple and regular, so the DTD I created is quite inflexible. Here's an extract, or see the whole DTD file.

server.dtd

<!ELEMENT mcheyne (notice,copyright,version,day,date,timezone,calendar,bible,quote,family,(secret|private))>
<!ELEMENT notice (#PCDATA|website)*>
<!ELEMENT website (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT version (#PCDATA)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT date (mday,month,year)>
<!ELEMENT mday (#PCDATA)>
<!ELEMENT month (#PCDATA)>
...

Each element either contains only text (ie. #PCDATA) or other elements as listed, and for the XML document to be valid it must conform to this structure.

With a DTD it is possible to validate the XML produced by the server using an XML 1.0 validator.

A good introduction to XML is on the XMLwriter website.

XSLT

Just for fun I wanted to try to format the output of the server nicely, so it can not only be used by scripts just to extract the the data they want, but can also be viewed directly as a simple web page. There seem to be two main options if you want to be able to format XML nicely for display in browsers: CSS and XSLT

CSS limitations

The first option is to use a standard CSS style sheet. The presentation of all the elements can be specified in the style-sheet in the normal way. Here's part of a rudimentary style-sheet for the M'Cheyne server data.

server.css

mcheyne, notice, copyright, quote, target, date { display: block; }
mday, month, year, reference, url { display: inline; }
day, timezone, calendar, bible, version  { display: none; }

mcheyne { background: #eee; }

notice {
    color: #888;
    text-align: center;
    margin: 1ex 0;
}

date {
    text-align: center;
    margin: 2ex 0;
}

quote {
    text-align: center;
    font-style: italic;
    font-size: 120%;
    margin:  1ex 0;
}
...

To tell the interpreter of the XML document to use this style-sheet insert a line like this before your DOCTYPE declaration,

<?xml-stylesheet type="text/css" href="server.css"?>

A drawback of this approach is that you are not able easily to modify the data, for example to change its order of presentation, or to make hyperlinks out of URLs. I wanted to do both of these with the M'Cheyne server XML output.

XSLT kicks butt

The solution is to use XSL Transformations (XSLT). This is a framework for taking an input XML code tree and transforming it in an arbitrary way to form an output tree. In this case, the output is an HTML document which the web browser will then display in the normal way, although it could be any arbitrary format.

An XSLT stylesheet contains templates which match specific elements in the source and describe how they are to be transformed to create the output. For example, in the M'Cheyne server output a <target> element contains a <url> element and a <reference> element. The following stylesheet fragment transforms this <target> element into a standard <a href=""> hypertext anchor with the url as the target and the reference as the text.

server.xsl

...
<xsl:template match="target">
  <a href="{url}" style="padding-left: 0.5em">
    <xsl:value-of select="reference"/></a>
</xsl:template>
...

See the full stylesheet for all the gory details.

To tell the interpreter of the XML document to use this XSL style-sheet insert a line like this before your DOCTYPE declaration,

<?xml-stylesheet type="text/xsl" href="server.xsl"?>

Note: Netscape and Mozilla need the .xsl file to be served up with MIME type text/xml or they won't use it (this is actually as per RFC2376). You can check if this is the case using my HTTP header viewer. If your server is not configured to do this you have a number of options. For one, you can simply give your stylesheet a .xml extension instead of .xsl. For another, with the Apache server, you can add a line like "AddType text/xml xsl" to your directory .htaccess file. A third option is to make the stylesheet a PHP or Perl CGI script that sets the correct Content-Type: text/xml HTTP header.

Now, if your browser supports it, when the XML document is downloaded the browser will transform the raw XML according to the XSL stylesheet to produce nicely formatted HTML. Mozilla 1.0, Netscape 7 and Internet Explorer 6 seem to handle this reliably, but Opera doesn't support it. If you point one of these browsers at the server you should see a simple webpage with the data nicely formatted rather than the raw output or just unformatted text.

Of course, you can still use CSS for formatting if you want to: just include your style sheet into the XSLT file so that it is output with the HTML and the browser will interpret it as normal.

A good introduction to XSLT (albeit heavily Windows biased) is on the W3schools website.

Server-side XSLT

Sometimes it is more appropriate for the XSL transformation to be done server-side rather than client-side. For example, the body of this page is an XML document which has been processed by XSLT before being inserted into the page template. In this case the work is done on the web-server—only the XHTML output is sent to your browser—which avoids the problem of having to deal with browsers that do not support XSLT.

An easy way to do server-side XSLT is to use PHP's built-in functions such as xslt_process().

from index.php

$x = xslt_create();
$result = xslt_process($x,
                       'file://'.getcwd()."/$file.xml",
                       'file://'.getcwd().'/website.xsl');
if ($result) {
    print $result;
} else {
    print "XSLT error: ".xslt_error($x).", error code: ".xslt_errno($x);
}
xslt_free($x);

Note that the behaviour of Sablotron (which is used by PHP's xslt_process) has changed - you should now use an absolute URI for the XSL and XML files, so we use PHP's getcwd() function to locate them. The URI is in the form "file://absolute-pathname/filename".

Your webserver may or may not be enabled for XSLT. If you run the phpinfo.php script and see a section headed "xslt" which says XSLT is enabled then you're in luck.

Example 1

On this page the main work done by the XSLT is to generate the "Page Index" at the top. This is done on-the-fly to ease maintenance: it is now very easy to add new pages and sections without the tedious and error-prone task of creating the index by hand.

Have a look at the XML source for this page and the corresponding XSL stylesheet for an example of how it can be used. The XML source is basically normal XHTML with the addition of some <section> and <subsect> elements. The document type declaration at the top is used to define any HTML entities that are not recognised by XSL, in this case an "emdash", which has Unicode number 8212.

Example 2

Another place where I use XSLT is in the picture gallery on my daughter's website. This is a fairly substantial example, illustrating lots of loops and conditionals, named templates, variables and parameters, and the use of the count() function among other things.

In this case the metadata for the pictures are stored in an XML file, and transformed according to an XSL Transformation file. The end product is a normal, validating XHTML webpage.

Example 3

My sermons pages use XSLT both to produce the main index page and the individual pages for each of the sermons among other things. Details of the sermons available are held in an XML index file which is itself automatically generated and cached from the sermon texts with another XSLT script.

The XSLT for the index reformats the XML file as a HTML table: it demonstrates sorting with XSL. The XSLT for the individual sermon pages transforms the XML for the selected sermon into a final XHTML page. The XSL document() function is used to include the body of the sermon which is stored in a separate file.

Incidentally, these examples demonstrate a crude way of using a arrays in XSLT. It's neither fast nor elegant, and it violates the standard (see below), but is pretty useful.

The array is created as a node-set in the variable $months which contains the twelve months in order. Then in the template below an array-lookup is effectively done with select="$months/month[position()=$m]"

<xsl:variable name="months">
  <month>January</month>
  <month>February</month>
...
  <month>November</month>
  <month>December</month>
</xsl:variable>

<xsl:template match="date">
  <xsl:number value="d"/>
  <xsl:text> </xsl:text>
  <xsl:variable name="m" select="m"/>
  <xsl:value-of select="$months/month[position()=$m]"/>
  <xsl:text> </xsl:text>
  <xsl:number value="y"/>
</xsl:template>

The XML fragment this matches looks like

<date><y>YYYY</y><m>MM</m><d>DD</d></date>

It is important to note that all this, strictly speaking, defies the XSLT 1.0 standard, since we are trying to treat a result tree fragment (the $months variable) as a node-set. In the XSLT 2.0 standard this will be allowed. Meanwhile, my XSLT processor (Sablotron) automatically converts result tree fragments to node-sets. Other processors may have extension functions (called something like exsl:node-set() or similar) that will do the conversion explicitly. Portable code to do this safely looks like this mess,

<xsl:choose>
  <xsl:when test="function-available('exsl:node-set')">
    <xsl:value-of select="exsl:node-set($months)/month[position()=current()/m]" />
  </xsl:when>
  <xsl:otherwise>
    <xsl:value-of select="$months/month[position()=current()/m]" />
  </xsl:otherwise>
</xsl:choose>

Example 4

My XSLT for the ESV Bible text demonstrates some nifty code for handling footnotes, including the use of <xsl:apply-templates/> "mode" attribute. See the ESV Web Service pages for more on this.

Conclusion

It looks like the combination of XML and XSLT is tremendously powerful, and it is definitely the future of the Web. The majority of Web users will find that their browsers already support client-side XSL transformation, or the transformations can be applied server-side, as described above.

But it's not just about web browsing: it marks a genuine division between the content of an XML document and the presentation of that document as defined in the XSL file. This means that the same data may be delivered in a multitude of ways by a multitude of media without the artificial constraints imposed by HTML and the visual formatting model. This will facilitate a host of web services that can be embedded into external internet enabled devices like, in its humble way, the M'Cheyne daily Bible reading server.

For a thorough overview of XML and related issues, have a look at the XML FAQ.

Ben's Web Programming Pages