See docs/LICENSE for The Xindice License Table of Contents ================= Introduction Target Platforms Release Notes Acknowledgments Introduction ------------ Xindice is an open source Native XML Database. It stores and indexes compressed XML documents in order to provide that data to a client application with very little server-side processing overhead. It also provides functionality that is unique to XML data, which can't easily be reproduced by relational databases. Target Platforms ---------------- The Xindice code base is written in Java. Because of this, Xindice will work on any operating system that supports the Java Developer's Kit. The following is a list of tested platforms and the JDK version that was used: Platform Java VM Used =========== ==================== Linux Sun's Java 2 SDK 1.3 (ulimit -s 2048 for SDK 1.3.1) Linux IBM Java 2 SDK 1.3 Windows NT Sun's Java 2 SDK 1.3 Solaris Sun's Java 2 SDK 1.3 Mac OS X Apple Java 2 SDK 1.3 (Included with Mac OS X) Release Notes ------------- Apache Xindice Version 1.0rc1 ============================= dbXML is now an Apache project, and has been renamed to Xindice (Zeen-dee-chay). Parts of the dbXML 1.5 tree were merged into the dbXML 1.0 tree in the process of this name change and migration, so we thought it best to release at least one release candidate as an Apache project. There are also many changes as a result of the branch merging. - Name changes. There have been a lot of changes in package, class, documentation, and identifier naming throughout the project as a result of the migration to the Apache project. - XML Namespace changes. XML namespaces that were defined by dbXML have been renamed. The "http://www.dbxml.org/" portion of those namespaces has been changed to "http://xml.apache.org/xindice/" - The Collection configuration system now uses the Database's system collection instead of the system.xml file. The system.xml file is now read-only, and is used for configuring the server framework. Collection management is read/write and uses the Xindice native file system to maintain configuration. - Complete JAXP bootstrapping. Xindice will bootstrap with whichever JAXP-capable XML parser the Java VM will resolve. You can override the JAXP SAXParserFactory using vm.cfg. It is not recommended that you override the JAXP DocumentBuilderFactory because Xindice implements an optimized DOM that utilizes the Xindice compression system. - Lazy writes have been added to the Paged system, which is the foundation for standard Filers and Indexers in Xindice. Long operations (like index creation) will now delay writes until the write buffer is filled or until the operation is completed. This can yield a 10% to 30% performance increase on index creation. - The --pagesize and --maxkeysize switches now work on Collection creation in addition to Index creation. Version 1.0b4 (Final Beta... No Really) ======================================= After releasing beta 3, we found out that there were some stability issues with the latest developer releases of Xalan, whose XPath engine we use for our query resolver. Some users were experiencing query failures with certain data sets. Because of this, we've had to roll back to a previous version of Xalan (2.0.1). Version 1.0b3 (Final Beta) ========================== Beta 3 is the final beta for dbXML before we release our 1.0 FCS version. This version provides improved concurrency, as well as several bug fixes. Version 1.0b2 (Beta 2!) ======================= Improved stability and scalability of the server. - ORB Change. In the past JacORB was used as the dbXML CORBA ORB, with this release JacORB has been replaced with OpenORB. It was found JacORB utilized too much memory while running as part of the server which severly limited the capacity of the system. - The XML:DB API has once again been brought in to conformance with the latest draft. - Several DOM Level 3 Core methods have been added, and the version of Xerces shipped with dbXML is now the most recent version in the Xerces 1 distribution. - Several bugs within the XUpdate system have been fixed. Version 1.0b1 (Beta!) ===================== We have reached Beta status. The server is fully functional, and the number of bugs should be minimal at this point. - Namespace support. The query and indexing systems now properly support namespaced elements and attributes (regardless of prefix consistency). - The most recent draft of the XML:DB API is now supported. This includes namespace support for XPath queries, and a few minor changes to the API. - A Testing framework has been added under java/tests. It is based on junit and can be used to perform regression testing against the server. - GZip compression was removed from the filers. It was slow. Also, because it was both buggy and out of our control, we had to get rid of it. - Lots of little bugs fixed here and there. Version 0.9.1 (The Broken ORB) ============================== Some minor updates, nothing to be alarmed about. Move along. - The XUpdateQueryService is now available via the XML:DB Collection class. - A lot of the problems that were being reporting regarding ORB versioning and VM configuration have been resolved. - Our DOM was broken in respect to DocumentFragments. Also, a bug in reporting node modification status up the tree has been fixed. This was causing XUpdate queries to break in some cases. - The Exception system has been further refined. Version 0.9 (Feature Complete) ============================== Several major changes have happened to the dbXML code base between versions 0.6 and 0.9. The most important of which is that we are now feature complete. - We are now feature complete. All of the features that will be in the 1.0 version of dbXML are now available. All we have to do now is continue to stabilize the server and fix bugs as they pop up. You can consider the status of the project to be Alpha quality now. - dbXML is now based on an Apache style license. We decided that the LGPL was too restrictive regarding what you could do with the source code. Beyond that, we're using several BSD and Apache licensed libraries, and it seemed unfair that we could build from their code, but they couldn't build from our's. - dbXML now includes support for the XML:DB XUpdate specification. We've integrated The Infozone Group's Lexus library into dbXML in order to provide support for XUpdate update logic. - Wire Compression is now supported by the CORBA APIs. The style of compression that is used by our DOM and SAX classes for Document storage is now being exposed via CORBA. This allows Documents and query results to be retrieved without requiring textual serialization on the server or parsing on the client. This capability is transparently supported by the XML:DB API. - NodeIndexer has split into NameIndexer and ValueIndexer. The ValueIndexer is used as NodeIndexer was, to store values for predicate comparisons. NameIndexer is used to store element references for standalone name components in location paths. Use a type of 'name' when defining an Index to create a NameIndexer. - Better Exception categorization. Exception fault codes have been further defined, categorized, and broken out by severity. The FaultCodes class now includes several utility methods for generating APIException instances and examining the fault codes that are stored in various types of Exception classes. - Application has been renamed to Database. Also, references to Application in various methods need to be changed to Database. Ex: getApplication() is now getDatabase() - An Address Book example is included, built on Tomcat. You can find more information in java/examples/Addressbook/README - The CORBA ORB used by the server is now easily pluggable. So far, JacORB and OpenORB are known to work. - SAX support has been added to the XML:DB API implementation. - The HTTP server port has been changed to 4080 to avoid conflicts on the commonly used 8080 port. This is mainly because Tomcat uses that port. Also, the Gopher port has changed to 4070. - More and more documentation. Version 0.6 (Much Closer) ========================= In the past couple of weeks, we've made quite a bit of progress in building out the server, and contributing to its overall stability. There's a lot left to do, but it's getting very close to being usable. - Lots of bug fixes in this version, but many more to come. - The Developer's Guide has been fleshed out quite a bit, and the Command-line Tools reference has been updated and converted to DocBook format. - dbXML fully supports the XML:DB API as it is currently published by the XML:DB Initiative. XML:DB API documentation is now included in the distribution. - Types are now supported by the NodeIndexer to ensure proper sorting. The available types can roughly be mapped to the Java native types (string, short, int, etc...) - The XPathQueryResolver now supports partial evaluation of some functions and index-based evalution of the starts-with function. - The XPathQueryResolver also supports the highly experimental, very cool, and potentially catastrophic autoindex feature. By default, it's turned off, so there's nothing to worry about. - IndexManager now performs background indexing instead of synchronous. Issuing a create index command will now immediately return as successful even though the index itself hasn't yet been built. - Query results now include a set of namespaced attributes that identify the collection and document that a particular node was retrieved from. - The command line tools now require an instance name when referencing a collection. The default instance name in a dbXML server is 'db'. So, for example, you might refer to a collection as '/db/root/addressbook'. Also, the short form of some of the action verbs have changed. See the Tools reference for more information. Version 0.5 (Woah!) =================== We've made some major changes to dbXML between version 0.4 and 0.5 that will affect the type of applications that can be developed solely with dbXML, so it's important to read this change log for more information. - dbXML has been broken into three separate projects, with the development focus remaining on the dbXML Core database server. Two other projects: The Juggernaut Server Framework, and dbXML App Services are available as separate CVS trees and are being developed in parallel. The Juggernaut class files are available in a Jar file as part of the distribution. The following is a list of the features that have been removed from the dbXML Core, and where they are now: - Juggernaut - cvs co Juggernaut - Service Framework - HTTP Server - App Services - cvs co dbXML-AppServices - GetObject (HTTP Retrieval) - SOAP Support - Cocoon Support - Scripting Support - Schema Compiler - XMLObject Compiler - We've renamed our packages from com.dbxml.* to org.dbxml.* - The ENTIRE Filing, Indexing, and Query systems have been completely rearchitected and rewritten pretty much from scratch. As a result: - QueryResolvers can be developed and plugged into the QueryEngine. - Full XPath syntax is now supported for Collection queries. This functionality is provided by the XPathQueryResolver. - The Indexing system participates in queries wherever possible. - You can safely add and remove Indexes to existing Collections. - A new Filer named BTreeFiler is available in addition to HashFiler. BTreeFiler is much more space conservative and doesn't suffer from collision and overflow issues as the Collection begins to grow past its original bounds. Both Filers are useful, but which you choose depends on your needs. By default, dbXML core will use BTreeFilers. - The Application class now extends Collection and can be thought of as a top-level root Collection. At some point in the future, Application will be renamed Database. - There have been a few changes to the Collection class. You can no longer store binary data in a Collection, only Documents. The getDocumentSet method allows you to enumerate through the Documents in a particular Collection. Collection has been broken into two classes. CollectionManager contains all management functionality for nested Collections (create, drop, list) while Collection contains functionality for the Collection instance (getDocument, insertDocument, etc...) - XMLObjects have been scaled back. There is now only one type of XMLObject. Application and Document XMLObjects have been removed. Because Application is now derived from Collection, a standard XMLObject can serve both roles. Document XMLObjects have been removed completely, requiring a developer to implement this functionality manually (it's about 1 line of code). The mapping looks like this: ApplicationContext -> XMLObject ApplicationXMLObject -> (gone) CollectionContext -> XMLObject CollectionXMLObject -> SimpleXMLObject DocumentContext -> (gone) DocumentXMLObject -> (gone) - The dbXML Client API has been replaced by an XML:DB Core Level 1 implementation. The XML:DB API is still a work in progress, and is likely to change, but this opens the doors to interoperable XML Database applications. For more information on the XML:DB API, visit http://www.xmldb.org - The Command-Line Tools have been broken into two separate tools. dbxmladmin provides administrative commands, while dbxml provides user-level commands. The Command-Line Tools now utilize the XML:DB API instead of the Client API. Some new features in the Command-Line Tools include: - Server Shutdown - You can now safely shut down the server, instead of having to send it a KILL signal. - Import/Export - You can import/export multiple Documents and directory structures between Collections and the file system. - XMLObject invokation. You can execute XMLObject methods and retrieve their results. - We're now using JacORB for our CORBA services. The JDK's ORB was very much lacking in a lot of areas. - JAXP support for creating and parsing dbXML compressed DOM Documents is now available. - And a whole bunch of other stuff. Version 0.4 (Progress) ====================== We've made quite a bit of progress between version 0.3 and 0.4 in features and in general system stability and performance. - The Indexing System and XPath querying are working. The indexing system now allows you to specify a XPath for narrowing individual indexes. - The Compressed DOM is essentially complete. - We've integrated Cocoon into dbXML to maximize transformation performance. - XMLObjects can now be created at various contexts within the server. These are Application, Collection, and Document. The ability to associate business logic at various levels of the repository is a powerful application design/management capability. As part of this: - What used to be XMLObjects are now DocumentContext XMLObjects. - What used to be Procedures are now CollectionContext XMLObjects. - Nested Collections. You can now manage collections of documents in a nested fashion for logically laying out your data stores. Databases have been replaced by top-level Collections. - The SystemCollection class will automatically compile a Schema using the XMLSchemaCompiler upon calling the setSchema() method. - XMLSerializable objects are classes whose state can be serialized to and from XML documents. The serialization is not an automated process at the moment, but the ability to introspect an object graph and produce XML is planned for a future release. XMLSerializable objects can be stored/retrieved to/from the database with the Collection set/getObject methods. As part of this: - SymbolTables are now represented using XMLSerializable objects. - Schemas are now represented using XMLSerializable objects. - XMLSchemaCompiler now produces XMLSerializable objects. - A Compressed DOM Symbol Table can be defined in the system configuration for hard-coding or using standardized symbol tables. SystemCollection uses a hard-coded symbol table to store compressed symbol tables. - A Gopher Service is now available, allowing Gopher-based directory and document browsing and querying of a dbXML repository. Gopher is useful for quickly browsing to documents being stored in the repository. Version 0.3 (Bye Bye C++) ========================= The C++ code is gone. dbXML is now 100% Java code. There have also been a few major additions to the system: - More Documentation. Yippee! - The Configuration framework is essentially fully functional. - The Compressed DOM is functional but still in an experimental state. A compressed Collection can be created by setting the compressed attribute to 'true' in the collection element. There are still some missing implementations, especially where DTD types are concerned, but most of the document core should work. - The foundation for dbXML autolinking is part of the dbXML Compressed DOM system. dbXML will automatically expand elements with links and respect document caching policies in expanding those links. See the User's Guide for more information. - The Indexing system is getting much closer to completion. Basic XPath querying is also in an experimental state. - A command line tool for managing the running server. This uses the CORBA APIs to manage the server. - XML Schema Compiler - The XML Schema compiler takes a W3C XML Schema (xsd) resource, and generates a set of Java classes based on the element, attribute, and element-relationship definitions in the Schema. The compiler still needs a lot of work in order to generate typed attributes (right now everything is a string), but it's a good start. In the future, this compiler will be an internal process, compiling all stored schemas for utilization by XMLObjects (so you don't have to use the DOM directly). - SOAP Support - All XMLObject and Procedures are automatically exposed by the server as SOAP services (as well as their original native protocol). SOAP support is limited to the capabilities of Procedures and XMLObjects. Object structure serialization may be implemented in a future release. Version 0.2 (Switch To Java) ============================ A major architectural shift occurred between 0.1 and 0.2. A design that had once consisted of about a 90%/10% C++ to Java ratio, has flip-flopped to a 95%/5% Java to C++ ratio. There are several reasons for this. First, in order to provide better integration with existing open source XML Server architectures, which are almost all Java-based, we decided that it would be best to avoid mixing the Java and non-Java worlds wherever possible. Second, we would be able to afford ourselves a major kick-start by utilizing some of the better parts of the Juggernaut architecture in our design. Third, doing XML in C++ is a headache. You spend more time worrying about memory management than you do in actually writing functioning code. In order to maintain a certain level of sanity for our staff, and contributors to the dbXML source code, we decided that Java would be the best choice for an implementation language. Version 0.1 (In The Beginning) ============================== The Three Filing Systems are sort of finished. There are likely a lot of places to optimize them and there are absolutely some re-entrant code issues, but these will be ironed out as I actually start using the filers with the Parser and Query Engine. HashFiler is a disk-based hashed bucket filing system. FSFiler is a filer that loads data directly from the operating system's file system based on their file name. MemFiler is a memory-based filing system, mainly for temporary in-memory tables and query result sets. A quick note about the HashFiler. dbXML's filing system was not written to be disk-space conservative, it was written to be incredibly efficient for handling large, variable-sized chunks of data. Where systems like gdbm and dbm try to be everything to everyone, HashFiler is really targeted for the dbXML project. HashFiler provides a simple block read caching mechanism with a default size of 50 blocks. All writes are performed immediately. Blocks should generally be optimized to a multiple of the operating system's block size and the number of pages per block should be a power of 2 and the resulting size of a page should be large enough to store the PageHeader (~64 bytes), key (up to you), and at least a fair amount of record data. HashFiler supports record compression if the size of a record spans past a single block and if compression will actually yield a compressed value(meaning if the compression actually lengthens the record, the compression is canceled). Compression can be toggled with the setCompressed method and tested with the isCompressed method. If a HashFiler has operated for a time with compression and compression is then turned off, existing compressed records are not decompressed until rewritten. Compression is performed using zlib with a compression method set to Z_BEST_SPEED which will perform well against variable length textual data (such as XML documents) but not most binary data. Acknowledgments --------------- This product includes software developed by the Infozone Group (http://www.infozone.org) This product includes software developed by the XML:DB Initiative (http://www.xmldb.org) This product includes software developed by the Exolab Project (http://www.exolab.org)