AxKit Matt Sergeant matt@axkit.com AxKit.com Ltd http://axkit.com/ ax_logo.png redbg.png Introduction Perl's XML Capabilities AxKit intro AxKit static sites AxKit dynamic sites (XSP) Advanced AxKit XML with Perl Introduction Basis : XML::Parser XS wrapper around expat expat - James Clark XML::Parser - originally by Larry Wall Now maintained by Clark Cooper Allows different "styles" of parsing Default style is callback/event based stream parsing Also implements a "push" parser XML::Parser usage SAX-like API register callback handler methods start tag end tag characters comments processing instructions ... and more Non validating XML parser dies (throws an exception) on bad XML XML::Parser code new( Handlers => { Start => \&start_tag, End => \&end_tag, # add more handlers here }); $p->parsefile("foo.xml"); exit(0); sub start_tag { my ($expat, $tag, %attribs) = @_; print "Start tag: $tag\n"; } sub end_tag { my ($expat, $tag) = @_; print "End tag: $tag\n"; } ]]> XML::Parser - maintaining state new(); my $p = XML::Parser->new( Handlers => { Start => sub { $ctxt->parse_start(@_) }, End => sub { $ctxt->parse_end(@_) } }); $p->parse("here"); package MyParser; sub new { return bless {}, shift; } sub parse_start { my ($self, $expat, $tag, %attribs) = @_; $self->{some_state} = "Open $tag"; } sub parse_end { my ($self, $expat, $tag) = @_; print "Last state: $self->{some_state}\n"; } ]]> PYX Turns XML into a stream of text events Based on SGML ESIS streams )start (end Aattribute=value -text ?processing instruction Comments are dropped PYX example $ pyx presentation.xml | more (slideshow -\n - (title -Developing XML Applications with Perl and AxKit )title -\n - (metadata -\n - (speaker -Matt Sergeant )speaker -\n - (email -matt@axkit.com )email ... PYX usage pyx xmlfile | something | pyxw pyx generates PYX streams pyxw generates XML from PYX pyxhtml parses HTML to PYX HTML Tidy, or html2xml Java utility by Dave Ragget of the W3C Tidy converts HTML to XHTML PYX can do the same: pyxhtml file.html | pyxw > file.xhtml pyxhtml uses HTML::TreeBuilder XPath intro W3C Standard for locating nodes within an XML document A subset of XPath is used for "matching" nodes in XSLT Looks like directory paths: /path/to/node But that's an abbreviated syntax... /child::path/child::to/child::node Full grammar containing expressions, calculations, functions, etc XPath Examples Find title of XHTML document /html/head/title Find all hrefs in <a> tags /html/body/descendant::a[@href]/@href XML::XPath Full implementation of W3C XPath on a DOM-like API Easy to use: new(filename => "foo.xml"); print $xp->findvalue("/html/head/title"); ]]> Can also process HTML: new( filename => "pyxhtml foo.html | pyxw |"); ]]> XML::XPath continued Every node has methods findvalue(), findnodes() and find() new(filename => "foo.xml"); foreach my $link ($xp->findnodes('/html/body/a[@href]')) { print "Link: ", $link->findvalue('@href'), "\n"; # or : # print "Link: ", $link->getAttribute('href'), "\n"; } ]]> Not 100% DOM compatible, but close XML::XPath Implementation XML::Parser and SAX parsers build an in-memory tree Hand-built parser for XPath syntax (rather than YACC based parser) Garbage Collection yet still has circular references (and works on Perl 5.005) pointers.png XML to XML Conversions Use XSLT (or XPathScript, see later) Either use the command line (e.g. xsltproc from libxslt) ... or call from Perl: new(); my $xslt = XML::LibXSLT->new(); my $source_dom = $parser->parsefile('foo.xml'); my $style_dom = $parser->parsefile('bar.xsl'); my $stylesheet = $xslt->parse_stylesheet($style_dom); my $results = $stylesheet->transform($source_dom); print $stylesheet->output_string($results); ]]> SAX Filters SAX events passed between SAX handlers e.g. tags to lower case: new( AsFile => '-' ); my $lc = XML::Filter::ToLower->new( Handler => $ya ); my $parser = XML::Parser::PerlSAX->new( Handler => $lc ); package XML::Filter::ToLower; @ISA = ('XML::Filter::Base'); sub start_element { my ($self, $element) = @_; $element->{Name} = lc($element->{Name}); $self->{Handler}->start_element($element); } sub end_element { my ($self, $element) = @_; $element->{Name} = lc($element->{Name}); $self->{Handler}->end_element($element); } ]]> SAX Filters (cont) Almost any level of complexity possible Chain as many filters together as you like Fast way of processing XML We don't have to build a tree in memory Possible to output to HTML using XML::Handler::HTMLWriter AxKit Introduction AxKit Introduction XML Publishing Framework XML Application Server Designed for Content Diversification Same content, delivered differently HTML, WAP, PDF, SOAP, etc Built in C, Perl, and mod_perl/Apache Plugin Architecture, allows operation to be changed easily How does it work? Apache handle Embedded perl interpreter via mod_perl Apache Config directives Transformation Pipeline Language Modules implement stylesheets and other engines XSP for Dynamic Content Caching Apache::Filter Support mod_perl mod_perl used to implement most of AxKit ... though more code being ported to C all the time Gives us bytecode compiled Perl Makes development faster Configuration Directives Config directives written in the C code Implemented on top of Apache Fits in with httpd.conf and .htaccess files 28 Config directives in total Setup the pipeline, debugging options, and various other flags Config Directives examples AxAddProcessor application/x-xsp . AxAddProcessor text/xsl /styles/cal2wml.xsl AxAddProcessor text/xsl /styles/cal2html.xsl ]]> Transformation Pipeline AxKit is a pipeline engine Output of one stage goes to input of the next Allows us to build up our application in stages Pipeline passed as one of: String DOM Tree SAX events Language Modules Every stage in the pipeline implemented by a language module Language modules can either use a stylesheet for transformation or not Examples of modules not using a stylesheet are "XSL:FO to PDF", XSP, and the module which produces this slideshow Different implementations of same language possible, e.g. XSLT XML::LibXSLT XML::Sablotron XML::XSLT XML::Xalan XML::Transformiix Languages modules available XSLT XPathScript XSP AxPoint (this slideshow) XMLNews::HTMLTemplate Template Toolkit w/XPath plugin Caching XML Transformation can be slow, caching vital to making a fast XML Publishing Engine Cache uses the filesystem, not memory or DBM files Cache results of transformation pipeline, where appropriate Cache results of dynamic processing by implementing has_changed() method Using filesystem allows Apache to deliver the cache directly: Apache->request->filename = <cache-file>; return DECLINED; Cache (cont.) Fastest possible implementation of a cache Default storage place is .xmlstyle_cache directory in same directory as XML file Overridable using AxCacheDir directive Files named as MD5 hash of XML filename, media type, style name and optionally run-time parameters that may affect the cache Does use large amounts of space, so can be turned off Non Cache-able content POST requests usually not cacheable XSP pages usually not cacheable Can turn off the cache in Perl code: $r->no_cache(1); Cache can be modified using plugins, so cache can be based on phase of the moon Apache::Filter Support Alternate plugin "Provider" module AxKit can get XML from previous mod_perl handler Works with Apache::ASP, HTML::Mason, etc Apache::Filter example ASP Script Hello World
Test Page Hello World The time is now: <%= localtime %> ]]> Apache::Filter example (stylesheet) <xsl:value-of select="/page/head/title"/>

]]>
Apache::Filter example (results) Hello World

Test Page

Hello World

The time is now: Tue May 8 07:41:33 2001

]]>
Gzipped Output Many HTTP/1.1 browsers will accept gzip encoded results (modulo bugs) AxKit can check Accept-Encoding header for this capability Will then gzip content before delivery Cache stores regular and gzipped copy Reduces used bandwidth by up to 85% Developer's Perspective Two ways to tell AxKit how to process XML Processing Instructions in the XML ]]> Configuration Directives in httpd.conf PIs act as override mechanism Also PIs are sometimes appropriate, e.g. for XSP files Processing Instructions Follows the W3C xml-stylesheet REC Works almost identically to HTML's LINK element Processing instruction is just <?target data?>, so attributes are called "psuedo attributes" Allowed attributes: href - the location of the stylesheet type - the mime type of the stylesheet alternate - yes or no, specifies alternate styling media - media type this stylesheet applies to title - group this stylesheet belongs to Only href and type are required Config Directives More complex and more powerful than PIs Fully documented in perldoc AxKit AxKit vs Cocoon Cocoon is a Servlet implementing some of the same functionality AxKit is tied to Apache Cocoon 2 [alpha] adds features like AxKit config directives Both implement XSP, and collaborate on the spec Both open source Perl and C vs Java Use what fits your environment AxKit for Static Content XML Publishing Using XML for the source of all our web site's content Semantically rich original data Delivery to different media types Example - slides Deliver to HTML Or PDF Or plain text AxKit can detect via plugins, which version the user wants XML Based Content DocBook XHTML TEI XML RSS MyML - invent your own! Or use Provider interface... OpenOffice MS Word (via libwv) Content Management System Or use other mod_perl modules (Mason, ASP, etc) to generate XML The Value of Semantics HTML has low semantic value Example point - farming the web XML provides the ability to add value to our content But we still need to deliver it as HTML AxKit can give you the best of both worlds Alternative Media The web is no longer just web browsers on PCs TV is becoming huge, Phone/Handheld is already huge But the cost of entry into different media types is still high Using XML for our content reduces that cost of entry And it will reduce your long term costs Media vs Styles Same concept, different reasons Use different styles for the same media device e.g. No graphics HTML version, printable version, PDF version. All delivered to browsers on PCs Some applications in personalisation The Key - Stylesheets Stylesheets translate XML on the fly to different formats Different stylesheets used for different media types ... and for different styles of the same media type Generally, content is cached for increased performance What is the benefit to you? Web Developers benefit because of separation of concerns Content Presentation Logic Site Management Development Scalability Long term site management is a mess without separation of concerns Templates vs print print print But we can do better Different devices throws a spanner in the works Ideal is for content creators to focus on content, developers to focus on logic, designers to focus on presentation, and webmasters to focus on site management Unfortunately this is still idealistic, but much better than print print print Content Authoring XML Editors WYSIWYM Editors (What You See Is What You Mean) foo2xml can also be used (e.g. pod2xml) Mostly commercial offerings: Adept, XMeTaL, XMLSpy Open source options: OpenOffice, Emacs+psgml, jedit Some re-training may be required Presentation Layer Currently the hardest part of converting from static HTML sites Some commercial tools available for writing XSLT stylesheets Sometimes templates generated using HTML tools (e.g. Frontpage) possible Often best way is to get a design, then have a stylesheet guru convert that Business Logic AxKit uses taglib concept Logic wrapped in XML tags <order_fluglebar quantity="1"/> Designers can drop these tags into their XML Logic tags output XML, not HTML Thus logic can be used for any output media/style See next section (Dynamic Content) Site Management AxKit uses Apache config directives in httpd.conf/.htaccess Allows you to use your web master skills Also allows flexible configuration based on <Location>, <Files>, and <Directory> directives XSLT Overview Reformation of transformation part of DSSSL into XML (from Scheme) Functional language (no side effects) Rules based ("apply this template when we see this element") Implemented in AxKit via: XML::LibXSLT (very fast compliant processor from Red Hat Labs XML::Sablotron (lightweight processor from Ginger Alliance Others: XML::Xalan, XML::XSLT, XML::Transformiix XPathScript Overview Combination of: Perl ASP <% %> delimiters XML::XPath for locating nodes in the source XML Declarative (Rules based) processing Not a functional language Side effects allowed Full access to Apache API XPathScript example (xml) Roger Rabbit Humour Jessica Rabbit Modelling ]]> XPathScript example (stylesheet) Employee List

Employees at Acme Corp.

<% foreach my $employee (findnodes('/employees/employee')) { %> <%= $employee->findvalue("name/lastname") %>, <%= $employee->findvalue("name/firstname") %> works in the <%= $employee->findvalue('department') %> department.
<% } %> ]]>
XPathScript example (results) Employee List

Employees at Acme Corp.

Rabbit, Roger works in the Humour department
Rabbit, Jessica works in the Modelling department
]]>
XPathScript - Processing Documents Transforming Data is very different to transforming Documents Documents have mixed content Needs rules based processing XPathScript implements a feature rich declarative processing system XPathScript - Declarative Templates The $t hash reference {'para'}{pre} = '

'; $t->{'para'}{post} = '

'; ]]>
Matches element names (unlike XSLT) XPathScript applies the rules as it traverses the document tree
XPathScript - $t subkeys Sub-keys of $t specify what to do Keys who's value is something to add to the ouput: pre post prechildren postchildren prechild postchild Other keys: showtag testcode XPathScript - testcode testcode value is a code ref Executed every time an element with that name is visited Gives access to XML::XPath API on that node, and a local copy of $t {'a'}{testcode} = sub { my ($node, $t2) = @_; if ($node->findvalue('@name')) { # process as anchor } else { # process as link } return DO_SELF_AND_KIDS; }; ]]> XPathScript tech details Script is compiled into Perl code Compiled once on first hit. Only re-compiled if changed Imports some functions: apply_templates(), findnodes(), findvalue() Fast enough for most real-time transformation needs (e.g. for dynamic content Complete Example Take23.org - mod_perl news and resources Uses RSS 1.0 for all headlines and syndication Static Content, but looks dynamic Accepts content for articles in: XHTML DocBook POD (via pod2xml) OpenOffice 614 XML Take23 - how it works httpd.conf contains AxAddRootProcessor directives Stylesheets written in XPathScript Stylesheets componentized to maximize reuse Uses OpenOffice provider module from AxKit.com to read OpenOffice files as XML News items created using AxKit::NewsMaker, a lightweight RSS/News content management system Cached content delivered at static HTML speeds
AxKit for Dynamic Content XSP is the Key XSP is an XML language invented by the Cocoon Project at Apache XSP is implementation language agnostic Cocoon implements using Java AxKit implements using Perl XSP allows you to embed code in XML XSP is a page-based language like PHP and ASP ... but XSP is better because you generate XML, maintaining separation of concerns XSP Example ]]> Which generates: Tue May 8 10:20:03 2001 ]]> Note: we then transform to HTML in the next part of the pipeline XSP Lowdown XSP implements 11 core tags page structure import content logic element attribute pi comment text expr (plus a couple of other minor/supporting tags) XSP and Namespaces XML Namespaces are key to how XSP works <prefix:tag xmlns:prefix="URI"> Each namespace can either implement custom functionality ... or just be part of the output We call the namespaces "Tag Libraries", or taglibs for short More XSP Examples Time::Piece sub afternoon { return localtime->hour > 12; } if (afternoon()) { Good Afternoon!!! } else { Good Morning!!! } ]]> XSP Implementation SAX Parser, callbacks generate Perl code Perl compiled into a unique class handler() method wraps non-XSP top level tags (<page> tag in previous example) xsp:logic outside of top level tags can generate local functions mod_perl developers will know how handy this is to avoid closure problems Parser code implements "namespace dispatch" taglib implementations register with the core XSP parser engine's namespace dispatch system XSP Taglibs Taglibs implement custom functionality Usually this is business logic Some low level taglibs exist on CPAN: Param Utils Cookie ParamAttrib SendMail Exceptions ESQL Implementing Taglibs 3 ways to implement taglibs (in decreasing levels of difficulty) 1. Write a Stylesheet this transforms the taglib tags into core XSP tags 2. Write a SAX parser here the parser has to write perl code 3. Use the AxKit TaglibHelper module here you just write a Perl module, naming the functions after the tags you want to implement Most of the taglibs on CPAN use method 2 at the moment, simply because of when TaglibHelper appeared Param Taglib if () { Your name is: } else {
Enter your name:
}
]]>
Things to note... Freely mixed Perl and XML No <% ... %> to introduce one or the other An XSP page is pure XML That has consequences: my $page = if ($page < 3) { # INVALID CODE. < not allowed in XML ... } ]]> Could iritate some people Easy to fix this case. Use 3 > $page Cookie Taglib my $value; if ($value = ) { $value++; } else { $value = 1; } $value

Cookie value: $value

]]>
More things to note XSP extends Perl's notion of DWIM (Do What I Mean) Note how the cookie:value is taken from a Perl expression It could equally have been hard coded. Witness these three versions: 3 2 + 1 ]]> All produce valid code SendMail Taglib if (!) {

You forgot to supply an email address!

} else { if ( eq "sub") { $to = "axkit-users-subscribe@axkit.org"; } elsif ( eq "unsub") { $to = "axkit-users-unsubscribe@axkit.org"; } $to Subscribe or Unsubscribe

(un)subscription request sent

}
]]>
Exceptions Taglib Previous slides lacking error handling All AxKit taglibs throw exceptions on error Need Exception taglib to catch them ... # code that throws exceptions An Error Occured: ]]> ESQL Taglib The mother of all taglibs Executes SQL against DBI connections Add Apache::DBI in the mix for cached connections Options for SQL with or without results Columns can be retrieved all at once, or one at a time Allows emulation of nested queries (e.g. for MySQL) ESQL Example Pg dbname=phonebook postgres ]]> ESQL Example (cont) if () { SELECT * FROM address WHERE id = } else { SELECT * FROM address }
]]>
ESQL Example (cont) Error Occured: ]]> ESQL Example (results)
2 Sergeant Matt Mr AxKit.com Ltd matt@axkit.com 1
]]>
PerForm Easy Form handling Should allow for same form to be presented to WML or other formats. But HTML is the priority Callback based Supports validation, loading, start/end form Generates XML abstract form, not HTML or XHTML form Example XSLT supplied for transforming to HTML PerForm (example) ... Index

View Items

]]>
Callbacks implemented in top-level <xsp:logic> section
PerForm (example cont) Callbacks: instance(); my @types = $db->get_asset_types(); return $selected, "All Types" => "0", map { $_->[1] => $_->[1] } @types; } use Apache::Util (); sub submit_go { my ($ctxt) = @_; return "view_asset.xsp?type=" . Apache::Util::escape_uri($ctxt->{Form}{types}); } ]]> Building your XSP Based App Hide logic in taglibs Or hide logic in Perl modules called from <xsp:logic> and <xsp:expr> sections Generate XML, not HTML Process to HTML in a further pipeline transformation
Advanced AxKit Alternate Providers Builtins: File, Filter, Scalar Others: e.g. OpenOffice API: init( %params ) key() = string exists() = ? process() = ? decline(%args) mtime() = int get_fh() = FH get_strref() = strref has_changed($mtime) = ? get_ext_ent_handler() = subref get_styles($media, $style) Example: Ever expiring file httpd.conf Alternate ConfigReaders new($r) StyleMap() = hashref CacheDir() = string ProviderClass() = string PreferredStyle() = string PreferredMedia() = string CacheModule() = string DebugLevel() = int StackTrace() = ? LogDeclines() = ? OutputCharset() = string ErrorStyles() = complex GzipOutput() = ? DoGzip() = ? GetMatchingProcessors() = complex XSPTaglibs() = list Alternate Cache Module Possibly store cache in a database API: new($r, $xmlfile, @parts) write($string) read() = string get_fh() = FH deliver() reset() has_changed($mtime) = ? mtime() exists() = ? key() = string no_cache($on) AxKit Future Plans Content Management System, for easier site building and maintainence Apache 2.0 Port More SAX work, more C work Allow JSP taglibs to work Conclusions Perl and XML are a powerful combination XPath and XSLT add to the mix... AxKit can reduce your long term costs In site re-design and in content re-purposing Open Source equal to commercial alternatives Resources and contact AxKit: http://axkit.org/ CPAN: http://search.cpan.org libxml and libxslt: http://www.xmlsoft.org Sablotron: http://www.gingerall.com XPath and XSLT Tutorials: http://zvon.org