AxKitMatt Sergeantmatt@axkit.comAxKit.com Ltd
http://axkit.com/
ax_logo.pngredbg.pngIntroductionPerl's XML CapabilitiesAxKit introAxKit static sitesAxKit dynamic sites (XSP)Advanced AxKitXML with Perl IntroductionBasis : XML::ParserXS wrapper around expatexpat - James ClarkXML::Parser - originally by Larry WallNow maintained by Clark CooperAllows different "styles" of parsingDefault style is callback/event based stream parsingAlso implements a "push" parserXML::Parser usageSAX-like APIregister callback handler methodsstart tagend tagcharacterscommentsprocessing instructions... and moreNon validating XML parserdies (throws an exception) on bad XMLXML::Parser codenew(
Handlers => {
Start => \&start_tag,
End => \&end_tag,
# add more handlers here
});
$p->parsefile("foo.xml");
exit(0);
sub start_tag {
my ($expat, $tag, %attribs) = @_;
print "Start tag: $tag\n";
}
sub end_tag {
my ($expat, $tag) = @_;
print "End tag: $tag\n";
}
]]>XML::Parser - maintaining statenew();
my $p = XML::Parser->new(
Handlers => {
Start => sub { $ctxt->parse_start(@_) },
End => sub { $ctxt->parse_end(@_) }
});
$p->parse("here");
package MyParser;
sub new {
return bless {}, shift;
}
sub parse_start {
my ($self, $expat, $tag, %attribs) = @_;
$self->{some_state} = "Open $tag";
}
sub parse_end {
my ($self, $expat, $tag) = @_;
print "Last state: $self->{some_state}\n";
}
]]>PYXTurns XML into a stream of text eventsBased on SGML ESIS streams)start(endAattribute=value-text?processing instructionComments are droppedPYX example
$ pyx presentation.xml | more
(slideshow
-\n
-
(title
-Developing XML Applications with Perl and AxKit
)title
-\n
-
(metadata
-\n
-
(speaker
-Matt Sergeant
)speaker
-\n
-
(email
-matt@axkit.com
)email
...
PYX usagepyx xmlfile | something | pyxwpyx generates PYX streamspyxw generates XML from PYXpyxhtml parses HTML to PYXHTML Tidy, or html2xmlJava utility by Dave Ragget of the W3CTidy converts HTML to XHTMLPYX can do the same:pyxhtml file.html | pyxw > file.xhtmlpyxhtml uses HTML::TreeBuilderXPath introW3C Standard for locating nodes within an XML documentA subset of XPath is used for "matching" nodes in XSLTLooks like directory paths: /path/to/nodeBut that's an abbreviated syntax.../child::path/child::to/child::nodeFull grammar containing expressions, calculations, functions, etcXPath ExamplesFind title of XHTML document/html/head/titleFind all hrefs in <a> tags/html/body/descendant::a[@href]/@hrefXML::XPathFull implementation of W3C XPath on a DOM-like APIEasy to use:new(filename => "foo.xml");
print $xp->findvalue("/html/head/title");
]]>Can also process HTML:new(
filename => "pyxhtml foo.html | pyxw |");
]]>XML::XPath continuedEvery node has methods findvalue(), findnodes() and find()new(filename => "foo.xml");
foreach my $link ($xp->findnodes('/html/body/a[@href]')) {
print "Link: ", $link->findvalue('@href'), "\n";
# or :
# print "Link: ", $link->getAttribute('href'), "\n";
}
]]>Not 100% DOM compatible, but closeXML::XPath ImplementationXML::Parser and SAX parsers build an in-memory treeHand-built parser for XPath syntax (rather than YACC based parser)Garbage Collection yet still has circular references (and works on Perl 5.005)pointers.pngXML to XML ConversionsUse XSLT (or XPathScript, see later)Either use the command line (e.g. xsltproc from libxslt)... or call from Perl:new();
my $xslt = XML::LibXSLT->new();
my $source_dom = $parser->parsefile('foo.xml');
my $style_dom = $parser->parsefile('bar.xsl');
my $stylesheet = $xslt->parse_stylesheet($style_dom);
my $results = $stylesheet->transform($source_dom);
print $stylesheet->output_string($results);
]]>SAX FiltersSAX events passed between SAX handlerse.g. tags to lower case:new( AsFile => '-' );
my $lc = XML::Filter::ToLower->new( Handler => $ya );
my $parser = XML::Parser::PerlSAX->new( Handler => $lc );
package XML::Filter::ToLower;
@ISA = ('XML::Filter::Base');
sub start_element {
my ($self, $element) = @_;
$element->{Name} = lc($element->{Name});
$self->{Handler}->start_element($element);
}
sub end_element {
my ($self, $element) = @_;
$element->{Name} = lc($element->{Name});
$self->{Handler}->end_element($element);
}
]]>SAX Filters (cont)Almost any level of complexity possibleChain as many filters together as you likeFast way of processing XMLWe don't have to build a tree in memoryPossible to output to HTML using XML::Handler::HTMLWriterAxKit IntroductionAxKit IntroductionXML Publishing FrameworkXML Application ServerDesigned for Content DiversificationSame content, delivered differentlyHTML, WAP, PDF, SOAP, etcBuilt in C, Perl, and mod_perl/ApachePlugin Architecture, allows operation to be changed easilyHow does it work?Apache handleEmbedded perl interpreter via mod_perlApache Config directivesTransformation PipelineLanguage Modules implement stylesheets and other enginesXSP for Dynamic ContentCachingApache::Filter Supportmod_perlmod_perl used to implement most of AxKit... though more code being ported to C all the timeGives us bytecode compiled PerlMakes development fasterConfiguration DirectivesConfig directives written in the C codeImplemented on top of ApacheFits in with httpd.conf and .htaccess files28 Config directives in totalSetup the pipeline, debugging options, and various other flagsConfig Directives examples
AxAddProcessor application/x-xsp .
AxAddProcessor text/xsl /styles/cal2wml.xsl
AxAddProcessor text/xsl /styles/cal2html.xsl
]]>Transformation PipelineAxKit is a pipeline engineOutput of one stage goes to input of the nextAllows us to build up our application in stagesPipeline passed as one of:StringDOM TreeSAX eventsLanguage ModulesEvery stage in the pipeline implemented by a language moduleLanguage modules can either use a stylesheet for transformation or notExamples of modules not using a stylesheet are "XSL:FO to PDF", XSP, and the module which produces this slideshowDifferent implementations of same language possible, e.g. XSLTXML::LibXSLTXML::SablotronXML::XSLTXML::XalanXML::TransformiixLanguages modules availableXSLTXPathScriptXSPAxPoint (this slideshow)XMLNews::HTMLTemplateTemplate Toolkit w/XPath pluginCachingXML Transformation can be slow, caching vital to making a fast XML Publishing EngineCache uses the filesystem, not memory or DBM filesCache results of transformation pipeline, where appropriateCache results of dynamic processing by implementing has_changed() methodUsing filesystem allows Apache to deliver the cache directly:Apache->request->filename = <cache-file>;
return DECLINED;
Cache (cont.)Fastest possible implementation of a cacheDefault storage place is .xmlstyle_cache directory in same directory as XML fileOverridable using AxCacheDir directiveFiles named as MD5 hash of XML filename, media type, style name and optionally run-time parameters that may affect the cacheDoes use large amounts of space, so can be turned offNon Cache-able contentPOST requests usually not cacheableXSP pages usually not cacheableCan turn off the cache in Perl code:$r->no_cache(1);Cache can be modified using plugins, so cache can be based on phase of the moonApache::Filter SupportAlternate plugin "Provider" moduleAxKit can get XML from previous mod_perl handlerWorks with Apache::ASP, HTML::Mason, etcApache::Filter exampleASP ScriptHello WorldTest Page
Hello World
The time is now: <%= localtime %>
]]>Apache::Filter example (stylesheet)