Apache Xindice Version 1.0
=============================
This is the first production release of Xindice. Changes from the
last release candidate are minimal.
- Fixed a path traversal security problem in the HTTP server.
- Fixed the Addressbook example to not send data to the client
after the connection had already been commited.
- SAXGenerator now properly generates prefixMapping events.
Known issues in version 1.0:
- UTF-8 Encoding is not entirely clean. Most latin derived
languages should be OK, but English is the most
robust. Xindice 1.1 will resolve any issues in this area.
- XPath queries that return a single atomic value (i.e. the value
of an attribute) rather then a node will return no result. You
must retrieve the containing element to retrieve the content
of an attribute.
- When using XUpdate with JDK 1.4 you must use the
standards override mechanism to replace the version of
Xalan included in the JDK with the version included in
Xindice.
See: http://java.sun.com/j2se/1.4/docs/guide/standards/index.html
for more information.
- On Windows, command line queries can have problems with the
quote handling of the windows shell. In general you should
put double quotes around the entire query string and use
single quotes in your XPath.
- This initial release of Xindice does not have any built in
security. If you run it on a public server you should insure
that remote access to port 4080 is restricted at the network
level. Security will be added in a future release.
Apache Xindice Version 1.0rc2
=============================
The focus of this release is on stabilization of the server.
- Fixed the Index corrupted error that some people were seeing
with 1.0rc1. If you saw this error it is recommended that
you rebuild your database files.
- Changed the way Xindice locates its files to make it easier
to embed the server into another process. Files are now
located relative to the xindice.home system property instead
of the working directory of the process.
- Changed the kernel to enable running it embedded without
exiting the VM on startup error and exit.
- Minor encoding fixes in the command line tools. More serious
attention will be payed to encoding issues in the 1.1 release
of Xindice. As it is some languages such as Russian and
Chinese can not be successfully stored in the server. This
will be fixed in a Xindice 1.1 release.
Apache Xindice Version 1.0rc1
=============================
dbXML is now an Apache project, and has been renamed to Xindice
(Zeen-dee-chay). Parts of the dbXML 1.5 tree were merged into
the dbXML 1.0 tree in the process of this name change and
migration, so we thought it best to release at least one release
candidate as an Apache project. There are also many changes as a
result of the branch merging.
- Name changes. There have been a lot of changes in package, class,
documentation, and identifier naming throughout the project as a
result of the migration to the Apache project. The most important
are summarized here.
- XML:DB URI changes. All XML:DB API uri should now be of the form
xmldb:xindice: instead of xmldb:dbxml:
- Source package changes. If you have any code that imported any
org.dbxml.* source code it will need to be changed to import the
proper packages from org.apache.xindice.*.
- XML Namespace changes. XML namespaces that were defined by
dbXML have been renamed. The "http://www.dbxml.org/" portion
of those namespaces has been changed to
"http://xml.apache.org/xindice/"
- The Collection configuration system now uses the Database's
system collection instead of the system.xml file. The
system.xml file is now read-only, and is used for configuring
the server framework. Collection management is read/write and
uses the Xindice native file system to maintain configuration.
- As installed the server no longer has any default collections
that can store documents. You must create a collection manually
before attempting to store any documents in the server.
- Complete JAXP bootstrapping. Xindice will bootstrap with
whichever JAXP-capable XML parser the Java VM will resolve.
You can override the JAXP SAXParserFactory using vm.cfg. It is
not recommended that you override the JAXP DocumentBuilderFactory
because Xindice implements an optimized DOM that utilizes the
Xindice compression system.
- Lazy writes have been added to the Paged system, which is the
foundation for standard Filers and Indexers in Xindice. Long
operations (like index creation) will now delay writes until
the write buffer is filled or until the operation is completed.
This can yield a 10% to 30% performance increase on index
creation.
- The --pagesize and --maxkeysize switches now work on Collection
creation in addition to Index creation.
Version 1.0b4 (Final Beta... No Really)
=======================================
After releasing beta 3, we found out that there were some stability
issues with the latest developer releases of Xalan, whose XPath
engine we use for our query resolver. Some users were experiencing
query failures with certain data sets. Because of this, we've had
to roll back to a previous version of Xalan (2.0.1).
Version 1.0b3 (Final Beta)
==========================
Beta 3 is the final beta for dbXML before we release our 1.0 FCS
version. This version provides improved concurrency, as well as
several bug fixes.
Version 1.0b2 (Beta 2!)
=======================
Improved stability and scalability of the server.
- ORB Change. In the past JacORB was used as the dbXML CORBA ORB, with
this release JacORB has been replaced with OpenORB. It was found
JacORB utilized too much memory while running as part of the server
which severly limited the capacity of the system.
- The XML:DB API has once again been brought in to conformance with
the latest draft.
- Several DOM Level 3 Core methods have been added, and the version
of Xerces shipped with dbXML is now the most recent version in the
Xerces 1 distribution.
- Several bugs within the XUpdate system have been fixed.
Version 1.0b1 (Beta!)
=====================
We have reached Beta status. The server is fully functional, and
the number of bugs should be minimal at this point.
- Namespace support. The query and indexing systems now properly
support namespaced elements and attributes (regardless of prefix
consistency).
- The most recent draft of the XML:DB API is now supported. This
includes namespace support for XPath queries, and a few minor
changes to the API.
- A Testing framework has been added under java/tests. It is
based on junit and can be used to perform regression testing
against the server.
- GZip compression was removed from the filers. It was slow.
Also, because it was both buggy and out of our control, we had
to get rid of it.
- Lots of little bugs fixed here and there.
Version 0.9.1 (The Broken ORB)
==============================
Some minor updates, nothing to be alarmed about. Move along.
- The XUpdateQueryService is now available via the XML:DB
Collection class.
- A lot of the problems that were being reporting regarding ORB
versioning and VM configuration have been resolved.
- Our DOM was broken in respect to DocumentFragments. Also, a bug
in reporting node modification status up the tree has been
fixed. This was causing XUpdate queries to break in some cases.
- The Exception system has been further refined.
Version 0.9 (Feature Complete)
==============================
Several major changes have happened to the dbXML code base between
versions 0.6 and 0.9. The most important of which is that we are
now feature complete.
- We are now feature complete. All of the features that will be
in the 1.0 version of dbXML are now available. All we have to
do now is continue to stabilize the server and fix bugs as they
pop up. You can consider the status of the project to be Alpha
quality now.
- dbXML is now based on an Apache style license. We decided that
the LGPL was too restrictive regarding what you could do with
the source code. Beyond that, we're using several BSD and
Apache licensed libraries, and it seemed unfair that we could
build from their code, but they couldn't build from our's.
- dbXML now includes support for the XML:DB XUpdate specification.
We've integrated The Infozone Group's Lexus library into dbXML
in order to provide support for XUpdate update logic.
- Wire Compression is now supported by the CORBA APIs. The style
of compression that is used by our DOM and SAX classes for
Document storage is now being exposed via CORBA. This allows
Documents and query results to be retrieved without requiring
textual serialization on the server or parsing on the client.
This capability is transparently supported by the XML:DB API.
- NodeIndexer has split into NameIndexer and ValueIndexer. The
ValueIndexer is used as NodeIndexer was, to store values for
predicate comparisons. NameIndexer is used to store element
references for standalone name components in location paths.
Use a type of 'name' when defining an Index to create a
NameIndexer.
- Better Exception categorization. Exception fault codes have
been further defined, categorized, and broken out by severity.
The FaultCodes class now includes several utility methods for
generating APIException instances and examining the fault
codes that are stored in various types of Exception classes.
- Application has been renamed to Database. Also, references to
Application in various methods need to be changed to Database.
Ex: getApplication() is now getDatabase()
- An Address Book example is included, built on Tomcat. You can
find more information in java/examples/Addressbook/README
- The CORBA ORB used by the server is now easily pluggable. So
far, JacORB and OpenORB are known to work.
- SAX support has been added to the XML:DB API implementation.
- The HTTP server port has been changed to 4080 to avoid conflicts
on the commonly used 8080 port. This is mainly because Tomcat
uses that port. Also, the Gopher port has changed to 4070.
- More and more documentation.
Version 0.6 (Much Closer)
=========================
In the past couple of weeks, we've made quite a bit of progress in
building out the server, and contributing to its overall stability.
There's a lot left to do, but it's getting very close to being
usable.
- Lots of bug fixes in this version, but many more to come.
- The Developer's Guide has been fleshed out quite a bit, and the
Command-line Tools reference has been updated and converted to
DocBook format.
- dbXML fully supports the XML:DB API as it is currently published
by the XML:DB Initiative. XML:DB API documentation is now
included in the distribution.
- Types are now supported by the NodeIndexer to ensure proper
sorting. The available types can roughly be mapped to the Java
native types (string, short, int, etc...)
- The XPathQueryResolver now supports partial evaluation of some
functions and index-based evalution of the starts-with function.
- The XPathQueryResolver also supports the highly experimental,
very cool, and potentially catastrophic autoindex feature. By
default, it's turned off, so there's nothing to worry about.
- IndexManager now performs background indexing instead of
synchronous. Issuing a create index command will now
immediately return as successful even though the index itself
hasn't yet been built.
- Query results now include a set of namespaced attributes that
identify the collection and document that a particular node
was retrieved from.
- The command line tools now require an instance name when
referencing a collection. The default instance name in a dbXML
server is 'db'. So, for example, you might refer to a collection
as '/db/root/addressbook'. Also, the short form of some of the
action verbs have changed. See the Tools reference for more
information.
Version 0.5 (Woah!)
===================
We've made some major changes to dbXML between version 0.4 and 0.5
that will affect the type of applications that can be developed
solely with dbXML, so it's important to read this change log for
more information.
- dbXML has been broken into three separate projects, with the
development focus remaining on the dbXML Core database server.
Two other projects: The Juggernaut Server Framework, and dbXML
App Services are available as separate CVS trees and are being
developed in parallel. The Juggernaut class files are available
in a Jar file as part of the distribution. The following is a
list of the features that have been removed from the dbXML Core,
and where they are now:
- Juggernaut - cvs co Juggernaut
- Service Framework
- HTTP Server
- App Services - cvs co dbXML-AppServices
- GetObject (HTTP Retrieval)
- SOAP Support
- Cocoon Support
- Scripting Support
- Schema Compiler
- XMLObject Compiler
- We've renamed our packages from com.dbxml.* to org.dbxml.*
- The ENTIRE Filing, Indexing, and Query systems have been
completely rearchitected and rewritten pretty much from scratch.
As a result:
- QueryResolvers can be developed and plugged into the QueryEngine.
- Full XPath syntax is now supported for Collection queries. This
functionality is provided by the XPathQueryResolver.
- The Indexing system participates in queries wherever possible.
- You can safely add and remove Indexes to existing Collections.
- A new Filer named BTreeFiler is available in addition to
HashFiler. BTreeFiler is much more space conservative and doesn't
suffer from collision and overflow issues as the Collection begins
to grow past its original bounds. Both Filers are useful, but
which you choose depends on your needs. By default, dbXML core
will use BTreeFilers.
- The Application class now extends Collection and can be thought of
as a top-level root Collection. At some point in the future,
Application will be renamed Database.
- There have been a few changes to the Collection class. You can no
longer store binary data in a Collection, only Documents. The
getDocumentSet method allows you to enumerate through the Documents
in a particular Collection. Collection has been broken into two
classes. CollectionManager contains all management functionality
for nested Collections (create, drop, list) while Collection
contains functionality for the Collection instance (getDocument,
insertDocument, etc...)
- XMLObjects have been scaled back. There is now only one type of
XMLObject. Application and Document XMLObjects have been removed.
Because Application is now derived from Collection, a standard
XMLObject can serve both roles. Document XMLObjects have been
removed completely, requiring a developer to implement this
functionality manually (it's about 1 line of code). The mapping
looks like this:
ApplicationContext -> XMLObject
ApplicationXMLObject -> (gone)
CollectionContext -> XMLObject
CollectionXMLObject -> SimpleXMLObject
DocumentContext -> (gone)
DocumentXMLObject -> (gone)
- The dbXML Client API has been replaced by an XML:DB Core Level 1
implementation. The XML:DB API is still a work in progress, and
is likely to change, but this opens the doors to interoperable
XML Database applications. For more information on the XML:DB
API, visit http://xmldb-org.sourceforge.net
- The Command-Line Tools have been broken into two separate tools.
dbxmladmin provides administrative commands, while dbxml provides
user-level commands. The Command-Line Tools now utilize the
XML:DB API instead of the Client API. Some new features in the
Command-Line Tools include:
- Server Shutdown - You can now safely shut down the server,
instead of having to send it a KILL signal.
- Import/Export - You can import/export multiple Documents and
directory structures between Collections and the file system.
- XMLObject invokation. You can execute XMLObject methods
and retrieve their results.
- We're now using JacORB for our CORBA services. The JDK's ORB was
very much lacking in a lot of areas.
- JAXP support for creating and parsing dbXML compressed DOM
Documents is now available.
- And a whole bunch of other stuff.
Version 0.4 (Progress)
======================
We've made quite a bit of progress between version 0.3 and 0.4 in
features and in general system stability and performance.
- The Indexing System and XPath querying are working. The indexing
system now allows you to specify a XPath for narrowing individual
indexes.
- The Compressed DOM is essentially complete.
- We've integrated Cocoon into dbXML to maximize transformation
performance.
- XMLObjects can now be created at various contexts within the
server. These are Application, Collection, and Document. The
ability to associate business logic at various levels of the
repository is a powerful application design/management capability.
As part of this:
- What used to be XMLObjects are now DocumentContext XMLObjects.
- What used to be Procedures are now CollectionContext XMLObjects.
- Nested Collections. You can now manage collections of documents in
a nested fashion for logically laying out your data stores.
Databases have been replaced by top-level Collections.
- The SystemCollection class will automatically compile a Schema
using the XMLSchemaCompiler upon calling the setSchema() method.
- XMLSerializable objects are classes whose state can be serialized
to and from XML documents. The serialization is not an automated
process at the moment, but the ability to introspect an object
graph and produce XML is planned for a future release.
XMLSerializable objects can be stored/retrieved to/from the
database with the Collection set/getObject methods.
As part of this:
- SymbolTables are now represented using XMLSerializable objects.
- Schemas are now represented using XMLSerializable objects.
- XMLSchemaCompiler now produces XMLSerializable objects.
- A Compressed DOM Symbol Table can be defined in the system
configuration for hard-coding or using standardized symbol tables.
SystemCollection uses a hard-coded symbol table to store
compressed symbol tables.
- A Gopher Service is now available, allowing Gopher-based directory
and document browsing and querying of a dbXML repository. Gopher
is useful for quickly browsing to documents being stored in the
repository.
Version 0.3 (Bye Bye C++)
=========================
The C++ code is gone. dbXML is now 100% Java code. There have also
been a few major additions to the system:
- More Documentation. Yippee!
- The Configuration framework is essentially fully functional.
- The Compressed DOM is functional but still in an experimental
state. A compressed Collection can be created by setting the
compressed attribute to 'true' in the collection element. There
are still some missing implementations, especially where DTD types
are concerned, but most of the document core should work.
- The foundation for dbXML autolinking is part of the dbXML
Compressed DOM system. dbXML will automatically expand elements
with links and respect document caching policies in expanding those
links. See the User's Guide for more information.
- The Indexing system is getting much closer to completion. Basic
XPath querying is also in an experimental state.
- A command line tool for managing the running server. This uses the
CORBA APIs to manage the server.
- XML Schema Compiler - The XML Schema compiler takes a W3C XML
Schema (xsd) resource, and generates a set of Java classes based on
the element, attribute, and element-relationship definitions in the
Schema. The compiler still needs a lot of work in order to
generate typed attributes (right now everything is a string), but
it's a good start. In the future, this compiler will be an
internal process, compiling all stored schemas for utilization by
XMLObjects (so you don't have to use the DOM directly).
- SOAP Support - All XMLObject and Procedures are automatically
exposed by the server as SOAP services (as well as their original
native protocol). SOAP support is limited to the capabilities of
Procedures and XMLObjects. Object structure serialization may be
implemented in a future release.
Version 0.2 (Switch To Java)
============================
A major architectural shift occurred between 0.1 and 0.2. A design
that had once consisted of about a 90%/10% C++ to Java ratio, has
flip-flopped to a 95%/5% Java to C++ ratio. There are several
reasons for this. First, in order to provide better integration with
existing open source XML Server architectures, which are almost all
Java-based, we decided that it would be best to avoid mixing the Java
and non-Java worlds wherever possible. Second, we would be able to
afford ourselves a major kick-start by utilizing some of the better
parts of the Juggernaut architecture in our design. Third, doing XML
in C++ is a headache. You spend more time worrying about memory
management than you do in actually writing functioning code. In
order to maintain a certain level of sanity for our staff, and
contributors to the dbXML source code, we decided that Java would be
the best choice for an implementation language.
Version 0.1 (In The Beginning)
==============================
The Three Filing Systems are sort of finished. There are likely a
lot of places to optimize them and there are absolutely some
re-entrant code issues, but these will be ironed out as I actually
start using the filers with the Parser and Query Engine. HashFiler
is a disk-based hashed bucket filing system. FSFiler is a filer that
loads data directly from the operating system's file system based on
their file name. MemFiler is a memory-based filing system, mainly
for temporary in-memory tables and query result sets.
A quick note about the HashFiler. dbXML's filing system was not
written to be disk-space conservative, it was written to be
incredibly efficient for handling large, variable-sized chunks of
data. Where systems like gdbm and dbm try to be everything to
everyone, HashFiler is really targeted for the dbXML project.
HashFiler provides a simple block read caching mechanism with a
default size of 50 blocks. All writes are performed immediately.
Blocks should generally be optimized to a multiple of the operating
system's block size and the number of pages per block should be a
power of 2 and the resulting size of a page should be large enough to
store the PageHeader (~64 bytes), key (up to you), and at least a
fair amount of record data.
HashFiler supports record compression if the size of a record spans
past a single block and if compression will actually yield a
compressed value(meaning if the compression actually lengthens the
record, the compression is canceled). Compression can be toggled
with the setCompressed method and tested with the isCompressed
method. If a HashFiler has operated for a time with compression and
compression is then turned off, existing compressed records are not
decompressed until rewritten. Compression is performed using zlib
with a compression method set to Z_BEST_SPEED which will perform well
against variable length textual data (such as XML documents) but not
most binary data.
Acknowledgments
---------------
This product includes software developed by the Infozone Group
(http://infozone-group.org)
This product includes software developed by the XML:DB Initiative
(http://xmldb-org.sourceforge.net)
This product includes software developed by the Exolab Project
(http://www.exolab.org)