]>
An Introduction to AxKit Matt Sergeant
matt@sergeant.org
2000 AxKit.com Ltd An introduction to AxKit, the XML Application Server for Apache
Introduction XML has, in theory, solved one of the problems facing web site developers: How to develop a consistent look across your site using a template/stylesheet system. Unfortunately a lot of the solution to that problem still remains out of reach of the majority of web sites. One major piece of the puzzle that still has not been perfected is the authoring stage. Authoring XML for non-XML savvy designers is still problematic. Some tools exist to solve this problem, such as XMetaL, but as first generation tools these still have rather large weak spots. I expect to see this side of the puzzle solved with this seasons software releases - I hope I'm not wrong. The other side of the puzzle for web developers is delivery. Its going to be an extremely long time until all clients support some sort of client side transformation. And I remain unconvinced this is the right way to do it. Take XSLT as an example; the Apache group are right now trying to develop a way to do XSLT without building an in-memory tree structure of the XML document. However, as an implementor of XPath (Perl's XML::XPath module) myself, I think they are going to find this a really tough nut to crack. Not only that, but all this parsing is extremely resource intensive, and I think that's the wrong model to be looking at, especially when we want to deliver to handheld devices. That leaves server side transformation. There are several available options for this. The most immediately obvious is static transformation. However that can start to become a maintainence nightmare. Then there are application servers that operate within their own world, such as Enhydra and Zope. These are excellent solutions for shops that already use those solutions. Finally there is Cocoon, a truly awsome technology, now part of the Apache project. Cocoon is a full blown Application server built around XML technology. Part of Cocoon is a system which associates stylesheets with XML files. See http://www.w3.org/TR/xml-stylesheet for details on this method. I don't personally have any gripes with Cocoon (I do have gripes with Java, but thats another issue). To an extent AxKit emulates Cocoon (although in reality, Cocoon and AxKit are just implementing suggested standards). AxKit fits into this picture by providing simple intuitive ways for web developers to deliver XML to clients in different media formats and stylesheets. AxKit also, very much like Cocoon, provides caching facilities built into its core, so that only at single points of change will AxKit attempt to re-create the document being delivered. Unless, of course, the developer explicitly decides not to cache the document. Unlike Cocoon though, AxKit is built in perl, and integrates extremely tightly with Apache. AxKit also provides some of the technology natively that Cocoon 2 is going to deliver (AxKit also doesn't provide some of the technology that Cocoon does deliver!). Why Choose AxKit? AxKit is based on a plugin architecture. This allows the developer to very quickly design modules based on currently available technology to achieve: New stylesheet languages, new methods for delivering alternate stylesheets and new methods for determining media types. Because it's built in perl, these sort of plugins are incredibly simple to develop. Not long after releasing AxKit, a developer wrote a file suffix stylesheet chooser module, which returns different stylesheets if the user requests file.xml.html or file.xml.text, in just 15 lines of code. This plugin architecture also makes developing new stylesheet modules very simple, using some of the readily available code in Perl's excellent CPAN (the Comprehensive Perl Archive Network). A stylesheet module to deliver XML-News files as HTML would only take a few lines of code based on David Megginson's XMLNews::HTMLTemplate module, and AxKit works out all the nuances of caching for you. Another important part of this is that AxKit is pragmatic about what it delivers to clients. It doesn't have to be HTML, or XHTML, strict HTML 4, or indeed compliant to any particular standard. This decision was made because no matter what, clients are still not going to upgrade their browsers just because you want them to. So AxKit says that you can deliver XML or XHTML if you want to (and the tools are there for you to do so), but its just as easy to deliver any other format. AxKit comes with a number of pre-built stylesheet modules, including two XSLT modules: one built around Perl's XML::XSLT module, a DOM based XSLT implementation that is in the beginning stages, and one built around Ginger Alliance Ltd's Sablotron XSLT library, which is a much more complete XSLT implementation built in C++, and is extremely fast. For the closet XSLT haters out there (come on - I know there are quite a few!) there's XPathScript - a language that takes some of the good features of XSLT, such as node matching and finding using XPath, and combines it with the power of ASP-like code integration and in-built Perl scripting. XPathScript also compiles your stylesheet into native perl code whenever it changes, so execution times are very good for XML stylesheet processing. As an example of XPathScript's power, I've created a DocBook stylesheet that dynamically can show separate sections of a DocBook/XML file. The core of AxKit is also very quick. Delivering cached results it runs at about 80% of the speed of Apache. It achieves this primarily because it's built in mod_perl. The tight coupling with Apache that mod_perl provides means that an awful lot of the code is running in compiled C. In order to deliver cached results, AxKit just tells Apache where to find the cached file, and that it doesn't want to handle it. Apache comes up with the goods at its usual lightning speed. Finally, AxKit works hand-in-hand with Apache. So any webmaster skills will not go to waste. Cocoon 2 is about to deliver a sitemap feature, whereby you don't have to use <?xml-stylesheet?> processing instructions everywhere to build up your site. AxKit already provides this, and integrates directly with Apache's <Files>, <Location> and <Directory> directives. All AxKit's configuration takes this approach, so you never have to teach a webmaster any new tricks to build up your XML site. Putting it all together In simple terms, how does AxKit work? AxKit registers with Apache a "handler". In Apache terms this is a module that works in a particular part of the request phase (which cover things like Authentication, Type checking, Response, and Logging). When a request for a file comes in, AxKit does some very quick checking to see if the file is XML. The main checks performed are to see if the file extension is .xml, and/or to check the first few characters of the file for the <?xml?> declaration. If the file is not XML, AxKit lets Apache deliver the file as it would normally. Note that using Apache's configuration methods described above, it's quite possible to apply this only to certain parts of your web site. When an XML file is detected, the next step is to call any plugin modules that determine the media type and/or stylesheet preference. Media type chooser plugins normally look at the User-Agent header, or at the Accept header, however its possible to use any method at all to determine the media type. Stylesheet choosers exist currently based on Path Info (this is a path following the filename, so you could request myfile.xml/mystyle), querystring (for example myfile.xml?style=mystyle), and file suffix (myfile.xml.mystyle). The final part, and the most significant part, is the plumbing together of all the stylesheets with the XML file in the right order, implementing cascading where appropriate, and also to "do the right thing" with regards to the cache. One "leg-up" we have on Cocoon here is that AxKit invalidates the cache when external entities (parsed or unparsed) change too. This allows modular stylesheets to change only part of their make-up and ensure that changes to these sub-components cause a re-build of the cache. Mapping XML Files to Stylesheets AxKit uses two separate methods for mapping XML files to stylesheets. The primary method is to use the W3C recommendation at http://www.w3.org/TR/xml-stylesheet. This specifies that a <?xml-stylesheet?> processing instruction at the beginning of the xml file (after the <?xml?> declaration, and before the first element) defines the location and type of the stylesheet. The actual details of how all this works are defined in TR/REC-html40 (which has just recently been superceded by html 4.01). The second method of mapping XML files to stylesheets is used when no usable <?xml-stylesheet?> directives are found in the XML file. This uses a option in your Apache configuration files. These directives can be used anywhere within Apache's <Files>, <Location>, <Directory> and .htaccess configuration system. In this way it's possible to define complex mapping rules for different file types and locations in whichever manner pleases you. AxKit then uses the type of the stylesheet (in the type="..." attribute of the <?xml-stylesheet?> directive, or the first parameter of the option) to decide on a module to use to process that type of file. Again this is slightly different to Cocoon 1.x, which requires special <?cocoon?> directives to be added to your XML files to determine the processor module to use. The type is then mapped to a module using another Apache configuration option: . Again, this directive can appear anywhere within Apache's configuration structure. This allows you to try different modules for your processing of the same file (for example, you might like to try both XSLT processors to see which suits your needs best). Choosing a Stylesheet In the course of examining the options of which stylesheets to choose, often a single XML file (or a - see above) can provide more than one option. There are two important parts of this to consider. The first is choosing from multiple stylesheets based on media type, and stylesheet preference. The Media type of a stylesheet must always match the requested media type, or be of media type "all", however it's worth noting here that Cocoon provides many alternative media types to the W3C's specification list, such as "wap", "lynx", "explorer" and "netscape". The merits of this are debatable. The stylesheet preference is based on 3 types of stylesheet: A persistant stylesheet, a preferred stylesheet and an alternate stylesheet. Persistant stylesheets declarations contain no title="..." attribute, preferred stylesheets contain a title="..." attribute, but have alternate="no" (or no alternate="..." attribute), and alternate stylesheets contain a title="..." attribute and have explicitly set alternate="yes". AxKit always applies persistant stylesheets, and will apply alternate stylesheets only if a plugin has determined that one should be displayed, otherwise the preferred stylesheets are used. This all seems rather confusing and long winded, but it allows a very modular system, and also allows for wonderful flexibility in choosing stylesheets for users. For example, a plugin could connect to a database and retrieve the correct alternate stylesheet for a particular user based on an authentication token. This would allow users to change the whole look of their favourite web site, and AxKit will do all the hard work for you. Cascading Stylesheets It's easy to get confused by the term "stylesheet" here. A quick read of this might make it seem like all they are good for is transforming static XML files into further static XML files. This is especially the case if all you can picture is XSL(T) (or even CSS). However stylesheets in AxKit's terms can do anything, provided you can build a Language module to parse it. The concept of stylesheets in AxKit replace all the stages in Cocoon: Producer, Processor and Formatter. So it becomes possible to, just as in Cocoon, return database results, format add tags, and format the result to WAP, HTML or any possible format. The term cascading here therefore refers to the case of one stylesheet's results "cascading" into the next. With AxKit there are a number of ways to achieve that. The first and simplest method is to have all your stylesheets based on DOM, and produce DOM trees. When all the stylesheets have finished processing, AxKit takes care to dispose of your DOM tree and output the results to the user agent. The second method of cascading is to simply cascade the textual results of your output. This is necessary with modules like Sablotron where there is no DOM tree available. Modules further down the processing stream are able to parse this string directly (provided they are designed to work this way) as XML, and continue processing. The final, and possibly most interesting method, is to use "end-to-end SAX". This is where AxKit sets up a chain of SAX handlers to process the document with. AxKit stylesheet languages based on SAX are responsible for simply sending on SAX events to the next SAX handler up the chain (they are provided a SAX handler to pass events to on construction). The final SAX handler in the chain simply outputs its results to the browser. This doesn't sound particularly interesting, until you consider that this end-to-end system starts outputting data to the browser immediately as soon as parsing begins. This system allows database modules to not build DOM trees in memory, which can be resource consuming, but to simply fire SAX events, and the output from the database will appear as results are available. Cocoon 2 will have a system similar, if not identical, to this. A Simple Setup Example Setting up AxKit is simple. I don't believe in tools like this being hard to use or even hard to setup. Provided you can use an editor and modify a few Apache configuration files, setup should be a breeze. Unfortunately AxKit requires mod_perl, so there is an extra component to install first. Installation of mod_perl can be complex, depending upon your setup. To that end I will just provide a link: The mod_perl Guide - Installation. Now onto AxKit itself. First, installing the required perl modules is very simple. Download AxKit (see link below), extract the archive and change to the directory created. Then simply type: &prompt;perl Makefile.PL &prompt;make &prompt;make test &prompt;make install If you don't have apxs in your path, mod_perl versions below 1.24 will produce a warning at the first step. This warning can be ignored. Next up, editing Apache's configuration files. First you need to enable AxKit so that Apache understands AxKit's configuration directives, so add the following line to your httpd.conf file: PerlModule AxKit Finally, you can add in the core of AxKit - handler itself. This can be added to any .htaccess file, or to your httpd.conf file: SetHandler perl-script PerlHandler AxKit AxAddStyleMap text/xsl Apache::AxKit::Language::Sablot The last line there associates the type "text/xsl" with the stylesheet module specified. Now you're ready to start serving up XML files. Check out the example files in the AxKit distribution, these should get you started. Conclusions AxKit provides web developers with the tools they need to deliver complex systems quickly, and eases them into the development process. It gives them the power to develop their own system for stylesheet decision making and also the flexibility to design completely new stylesheet languages. All of this while integrating tightly with Apache, providing a fast, scalable and well architectured system. But then, I'm biased. AxKit is not finished yet, however the majority of the features described above are built and working very reliably. The most significant things missing from AxKit are SAX based stylesheet languages (which just need to be designed and built - which I have a number of ideas for), and alternate ways to generate the initial XML file (which cocoon calls "Producers"). These will be coming in a future release. Being free software I hope people will jump in and help. We have the beginnings of an active mailing list, where you can vote on features, or help develop them, or simply lurk. We're moving extremely quickly with the features. Developing in Perl allows us to do this, while still maintaining readable code (something I deem very important - so don't assume because it's written in Perl that it's going to be a ball of spaghetti!). If there's something you'd like to see in AxKit, please join the mailing list and participate with us. Links The following are links relevant to this article: AxKit - The main homepage for AxKit.