$Id: $ Commons Digester Package Version 2.0 alpha Release Notes INTRODUCTION: ============ The Apache Jakarta Commons Digester Release 2.0 of the Apache Jakarta Commons Digester package is a significant rewrite of the original package. All the fundamental concepts remain the same, but the APIs have been redesigned based on the lessons learnt from the 1.x series of releases. IMPORTANT NOTES =============== Dependencies ------------ The 2.0 Digester release requires: Logging 1.0.x + BeanUtils 1.7 MAJOR CHANGES SINCE 1.x ======================= This section is intended for the use of those familiar with the 1.x releases of this product. There are many changes, but those listed below are the most significant. Mostly, this information is restricted to listing changes in *functionality*; only a few implementation-level changes are listed here. Versioning ---------- At the current time, the new code uses the package name org.apache.commons.digester2.* There will no doubt be debate over whether this is a good idea, or whether the original org.apache.commons.digester.* package names should be used. General principles ------------------ * Protected members are not used for classes in the o.a.c.digester2 package. Instead, members are private, and protected setter/getter methods are provided where needed. This makes it easier in future to change classes without breaking existing subclasses that have been defined by users of the Digester classes. * It is still undecided whether concrete Action classes should follow the above approach or use protected members. Renamed/repackaged classes ---------------- * Rule --> Action The term "rule" has confused a number of people over the years. The new and hopefully clearer term "action" is used instead. The word "rule" is now used only to refer to a (pattern, action) pair, which is more intuitive. * Rules --> RuleManager The word "Rules" to mean *not* a collection of Rule objects, but instead the pattern-matching engine that happens to *contain* a collection of Rule objects was always confusing. * RulesBase --> DefaultRuleManager This should speak for itself. * All the basic action classes (formerly Rule classes) now reside in the o.a.c.digester2.actions package. * Renamed actions: NodeCreateRule --> CreateNodeRule ObjectCreateRule --> CreateObjectAction FactoryCreateRule --> CreateObjectWithFactoryAction ObjectCreationFactory --> ObjectFactory AbstractObjectCreationFactory --> AbstractObjectFactory Digester class ------------------ * Digester has been split into: * Digester * SAXHandler * Context * ActionFactory The old Digester interface had a huge number of methods. Many of these were only because Digester also implemented the interfaces necessary to: (a) handle the SAX parser callbacks, and (b) for the Rule (now Action) classes to store data on it during the parse (the object stack etc). (c) conveniently create Rule (now Action) instances. These pieces of functionality have now been split out into separate classes, so: * Digester now contains only the basic methods that users of the library need to interact with. * SAXHandler handles the callbacks from the parser * Context holds the object stack, current match path, and related data. * ActionFactory provides the factory methods to conveniently create, configure and add Action objects to a Digester or RuleManager. Moving this functionality out of the Digester object also allows the Digester class to be distributed with a subset (including none) of the default Action classes if desired. Note that because parsing state is stored on the Context object now, it is easier to implement the often-requested feature of being able to parse multiple xml documents with the same Digester instance. Namespace-aware parsing ----------------------- The Digester now *always* uses a namespace-aware xml parser. The DefaultRuleManager patterns properly support namespaces, eg /ns1:foo/ns2:bar/baz where the URIs that ns1 and ns2 correspond to have been defined via earlier calls to method DefaultRuleManager.addNamespace(prefix, uri). Entity Resolution ----------------- The basic functionality previously provided for entity resolution has been improved. * By default any attempt to access an external entity which has not been explicitly mapped to some (presumably local) resource is regarded as a fatal error. See setAllowUnknownExternalEntities * External DTDs can be ignored. Yes, this has dangers, but sometimes it is necessary. See setIgnoreExternalDTD. DefaultRuleManager ------------------ The DefaultRuleManager (formerly RulesBase) now uses a more xpath-like syntax for its patterns. It still isn't full xpath support, just a little closer for general consistency. In particular, a leading slash is required on absolute paths. A pattern with no leading slash is a relative path, and is equivalent to the old "*/" prefix. Action (formerly Rule) API changes ----------------------------------- * Action is an interface. The AbstractAction class has been defined and is the recommended base for all custom actions. * Action classes no longer have a "digester" member pointing to their "owner". Instead, the begin/body/end methods are always passed a Context object that allows them to access the object stack etc. * Action classes are required to avoid modification of any member variable during parsing (ie from their begin/body/end methods). All data must instead be stored on the provided Context object. This effectively makes an Action instance both re-entrant and thread-safe. * The two regulations above mean that an Action instance can now be used concurrently by multiple Digester instances (eg in a pool). * Deprecated methods have been removed. * Actions get "bodySegment" callbacks when their content is mixed text and child elements. This allows Actions to process XHTML-style markup input more easily. * Actions get a new "beginParse" callback when startDocument occurs. * method finish renamed to finishParse SetPropertiesAction ------------------- * The option now exists to specify the custom attr->property mapping via a Map parameter, not just a pair of String arrays. This is much nicer. * hyphenated xml attribute names are now automatically mapped to camelCase, eg some-attr="1" causes a call to setSomeAttr("1"). CreateNodeAction ---------------- * It is now possible to create DOM1 (ie non-namespaced) nodes and attributes even when the parser being used is namespace-aware. * Namespace-aware elements and attributes are created by default * The implementation has changed; rather than redirecting the xml parser to itself, the SAXHandler object is requested to forward ContentHandler calls to itself. This has no externally-visible effect, but makes the implementation much cleaner (esp. cleanup after a parse failure). CreateObjectAction ------------------ * The ignoreCreateException functionality has been removed. I'm not sure what use-cases it supports, or whether anybody actually uses it. The code is rather complex and nasty, so if someone really needs this functionality they can complain, and we can add it back in later with sufficient comments to allow future maintainers to know when the feature is useful... Exceptions ----------- A lot more methods are declared to throw explicit Exceptions, which should result in more reliable and explicit error-handling. Terminology ------------ The word "pattern" is now used exclusively for a string that is interpreted by a RuleManager instance. The word "path" is now used for a string that describes an absolute path from the root document node to the current xml element. When a pattern matches the path, the associated Action is executed. Xml-rules ------------ The xmlrules module has not yet been reimplemented. However the following changes are planned: * A RuleManager instance will be returned rather than a Digester. Because a RuleManager is thread-safe, this allows a pool of Digester instances to be configured with this object without having to reparse the xmlrules input file. * the xmlrules file will be able to specify what RuleManager subclass is desired (with the default being the DefaultRuleManager class). * The rule parser constructor will take a list of Action (formerly Rule) classes, and will auto-configure itself by using reflection against these classes rather than the current system where code is written for each Rule class. * Because the list of Actions to support is passed in at runtime, the rule parser class will not have explicit dependencies upon the default actions. This allows the class to be distributed without the set of default actions if desired. The ActionFactory class will provide a factory method for creating a rule parser instance which knows about all the default actions * The input xmlrules file will be able to specify custom action classes. Other notes ----------- * The Digester class now only deals in XMLReader rather than SAXParser. This shouldn't remove any functionality, just simplify the code. * The default errorHandler methods now throw an exception for errors and fatal-errors reported by the parser rather than the old behaviour of just logging the error then continuing. * ParserFeatureSetterFactory and related classes have not been reimplemented, and will not be reimplemented by me. If they are wanted, someone else will have to do this. * I haven't implemented RuleSets. Are they useful to anyone? * the peek and pop methods on the digester, parameter and named stacks now throw an exception if misused rather than return null. Still TO-DO ------------ * Think about alternative ways of performing logging. * Think about how to support pattern syntax of "/foo[@attr=value]" style. This may require a quite different API for RuleManager, so that RuleManager is passed the actual Elements required, rather than a string representing just the current path. * break up CallParamAction into multiple simpler actions * refactor CallMethodAction to clean up its constructor. * Fix rules that store data on themselves. * Think about resolving dependency issues on Beanutils by allowing digester to use beanutils via a local classloader. That means that it is ok to use digester even in a situation where another version of beanutils is the default. * sort out schemaLocation/schemaLanguage mess. * support rules to handle processing instructions. * look into moving from BeanUtils to Morph, as BeanUtils has a lot of functionality we don't use.