Differences to Jackrabbit 2
Backward compatibility
Oak implements the JCR API and we expect most applications to work out of the box. However, the Oak code base is a rewrite from scratch and therefore differs from Jackrabbit 2 in some aspects. Some of the more obscure parts of JCR are not (yet) implemented. If you encounter a problem running your application on Oak, please cross check against Jackrabbit 2 before reporting an issue against Oak.
Reporting issues
If you encounter a problem where functionality is missing or Oak does not behave as expected please check whether this is a known change in behaviour or a known issue. If in doubt ask on the [Oak dev list] (http://oak.markmail.org/). Otherwise create a new issue.
Notable changes
This section gives a brief overview of the most notable changes in Oak with respect to Jackrabbit 2. These changes are generally caused by overall design decisions carefully considering the benefits versus the potential backward compatibility issues.
Names and Unicode String values
The limitations are described in Constraints.
Node Name Length Limit
With the document storage backend there was a limit of 150 UTF-8 bytes on the length of the node names with MongoDB 4.0 and earlier. Starting with Oak 1.44 and MongoDB 4.2, this limitation has been removed. The limit still exists when running Oak on a document storage backend using RDBMS. See also OAK-2644 and OAK-9757.
Session state and refresh behaviour
In Jackrabbit 2 sessions always reflects the latest state of the repository. With Oak a session reflects a stable view of the repository from the time the session was acquired ([MVCC model] (http://en.wikipedia.org/wiki/MVCC)). This is a fundamental design aspect for achieving the distributed nature of an Oak repository. A rarely encountered side effect of this is that sessions expose write skew.
This change can cause subtle differences in behavior when two sessions perform modifications
relying on one session seeing the other session's changes. Oak requires explicit calls to
Session.refresh()
in this case.
Note: To ease migration to Oak, sessions being idle for more than one minute will log a warning to the log file. Furthermore sessions are automatically synchronised to reflect the same state across accesses within a single thread. That is, an older session will see the changes done through a newer session given both sessions are accessed from within the same thread.
Automatic session synchronisation is a transient feature and will most probably be removed in future versions of Oak. See OAK-803 for further details regarding session backwards compatibility and OAK-960 regarding in thread session synchronisation.
The
SessionMBean
provides further information on when a session is refreshed and wheter a refresh will happen on the next access.
On Oak Item.refresh()
is deprecated and will always cause an Session.refresh()
. The former call
will result in a warning written to the log in order to facilitate locating trouble spots.
On Oak Item.save()
is deprecated and will per default log a warning and fall back to
Session.save()
. This behaviour can be tweaked with -Ditem-save-does-session-save=false
in which
case no fall back to Session#save()
will happen but an UnsupportedRepositoryException
is thrown
if the sub-tree rooted at the respective item does not contain all transient changes. See
OAK-993 for details.
Query
Oak does not index as much content by default as does Jackrabbit 2. You need to create custom indexes when necessary, much like in traditional RDBMSs. If there is no index for a specific query then the repository will be traversed. That is, the query will still work but probably be very slow. See the query overview page for how to create a custom index.
There were some smaller bugfixes in the query parser which might lead to incompatibility. See the query overview page for details.
In Oak, the method QueryManager.createQuery
does not
return an object of type QueryObjectModel
.
Observation
-
Event.getInfo()
contains the primary and mixin node types of the associated parent node of the event. The keyjcr:primaryType
maps to the primary type and the keyjcr:mixinTypes
maps to an array containing the mixin types. -
Event.getUserId()
,Event.getUserData()
andEvent.getDate()
will only be available for locally generated events (i.e. on the same cluster node). To help identifying potential trouble spots, calling any of these methods without a previous call toJackrabbitEvent#isExternal()
will write a warning to the log file. -
Push notification mechanisms like JCR observation weight heavy on distributed systems. Therefore, if an application requirement is not actually an “eventing problem” consider using different means like query and custom indexes. Apache Sling identified and classified common usage patterns of observation and recommendations on alternative solutions where applicable.
-
Event generation is done by looking at the difference between two revisions of the persisted content trees. Items not present in a previous revision but present in the current revision are reported as
Event.NODE_ADDED
andEvent.PROPERTY_ADDED
, respectively. Items present in a previous revision but not present in the current revision are reported asEvent.NODE_REMOVED
andEvent.PROPERTY_REMOVED
, respectively. Properties that changed in between the previous revision and the current revision are reported asPROPERTY_CHANGED
. As a consequence operations that cancelled each others in between the previous revision and the current revision are not reported. Furthermore the order of the events depends on the underlying implementation and is not specified. In particular there are some interesting consequences:-
Touched properties: Jackrabbit 2 used to generate a
PROPERTY_CHANGED
event when touching a property (i.e. setting a property to its current value). Oak keeps closer to the specification and omits such events. More generally removing a subtree and replacing it with the same subtree will not generate any event. -
Removing a referenceable node and adding it again will result in a
PROPERTY_CHANGED
event forjcr:uuid
; the same applies for other built-in protected and mandatory properties such as e.g. jcr:versionHistory if the corresponding versionable node was removed and a versionable node with the same name is being created. -
Limited support for
Event.NODE_MOVED
:-
A node that is added and subsequently moved will not generate a
Event.NODE_MOVED
but aEvent.NODE_ADDED
for its final location. -
A node that is moved and subsequently removed will not generate a
Event.NODE_MOVED
but aEvent.NODE_REMOVED
for its initial location. -
A node that is moved and subsequently moved again will only generate a single
Event.NODE_MOVED
reporting its initial location assrcAbsPath
and its final location asdestAbsPath
. -
A node whose parent was moved and that moved itself subsequently reports its initial location as
srcAbsPath
instead of the location it had under the moved parent. -
A node that was moved and subsequently its parent is moved will report its final location as
destAbsPath
instead of the location it had before its parent moved. -
Removing a node and adding a node with the same name at the same parent will be reported as
NODE_MOVED
event as if it where caused byNode.orderBefore()
if the parent node is orderable and the sequence of operations caused a change in the order of the child nodes. -
The exact sequence of
Node.orderBefore()
will not be reflected throughNODE_MOVED
events: given two child nodesa
andb
, orderinga
afterb
may be reported as orderingb
beforea
.
-
-
-
The sequence of differences Oak generates observation events from is guaranteed to contain the before and after states of all cluster local changes. This guarantee does not hold for cluster external changes. That is, cancelling operations from cluster external events might not be reported event though they stem from separate commits (
Session.save()
). -
Unregistering an observation listener blocks for no more than one second. If a pending
onEvent()
call does not complete by then a warning is logged and the listener will be unregistered without further waiting for the pendingonEvent()
call to complete. See OAK-1290 and JSR_333-74 for further information. -
See OAK-1459 introduced some differences in what events are dispatch for bulk operations (moving and deleting sub-trees):
Operation | Jackrabbit 2 | Oak |
---|---|---|
add sub-tree | NODE_ADDED event for every node in the sub-tree | NODE_ADDED event for every node in the sub-tree |
remove sub-tree | NODE_REMOVED event for every node in the sub-tree | NODE_REMOVED event for the root of the sub-tree only |
move sub-tree | NODE_MOVED event, NODE_ADDED event for the root of the sub-tree only, NODE_REMOVED event for every node in the sub-tree | NODE_MOVED event, NODE_ADDED event for the root of the sub-tree only, NODE_REMOVED event for the root of the sub-tree only |
Binary streams
In Jackrabbit 2 binary values were often (though not always) stored in
or spooled into a file in the local file system, and methods like
Value.getStream()
would thus be backed by FileInputStream
instances.
As a result the available()
method of the stream would typically return
the full count of remaining bytes, regardless of whether the next read()
call would block to wait for disk IO.
In Oak binaries are typically stored in an external database or (in case of
the SegmentNodeStore) using a custom data structure in the local file system. The streams
returned by Oak are therefore custom InputStream
subclasses that implement
the available()
method based on whether the next read()
call will return
immediately or if it needs to block to wait for the underlying IO operations.
This difference may affect some clients that make the incorrect assumption
that the available()
method will always return the number of remaining
bytes in the stream, or that the return value is zero only at the end of the
stream. Neither assumption is correctly based on the InputStream
API
contract, so such client code needs to be fixed to avoid problems with Oak.
Locking
Oak does not support the strict locking semantics of Jackrabbit 2.x. Instead
a “fuzzy locking” approach is used with lock information stored as normal
content changes. If a mix:lockable
node is marked as holding a lock, then
the code treats it as locked, regardless of what other concurrent sessions
that might see different versions of the node see or do. Similarly a lock token
is simply the path of the locked node.
This fuzzy locking should not be used or relied as a tool for synchronizing the actions of two clients that are expected to access the repository within a few seconds of each other. Instead this feature is mostly useful as a higher level tool, for example a human author could use a lock to mark a document as locked for a few hours or days during which other users will not be able to modify the document.
Same name siblings
Same name siblings (SNS) are deprecated in Oak. We figured that the actual benefit supporting same name siblings as mandated by JCR is dwarfed by the additional implementation complexity. Instead there are ideas to implement a feature for automatic disambiguation of node names.
In the meanwhile we have basic support for same name siblings but that might not cover all cases.
XML Import
The import behavior for
IMPORT_UUID_CREATE_NEW
in Oak is implemented slightly different compared to Jackrabbit. Jackrabbit 2.x only creates a new
UUID when it detects an existing conflicting node with the same UUID. Oak always creates a new UUID,
even if there is no conflicting node. The are mainly two reasons why this is done in Oak:
- The implementation in Oak is closer to what the JCR specification says: Incoming nodes are assigned newly created identifiers upon addition to the workspace. As a result, identifier collisions never occur.
- Oak uses a MVCC model where a session operates on a snapshot of the repository. It is therefore very difficult to ensure new UUIDs only in case of a conflict. Based on the snapshot view of a session, an existing node with a conflicting UUID may not be visible until commit.
In contrast to Jackrabbit 2 expanded names are not supported in System View documents for neither nodes nor properties (OAK-9586).
Identifiers
In contrast to Jackrabbit 2.x, only referenceable nodes in Oak have a UUID assigned. With Jackrabbit
2.x the UUID is only visible in content when the node is referenceable and exposes the UUID as a
jcr:uuid
property. But using Node.getIdentifier()
, it is possible to get the UUID of any node.
With Oak this method will only return a UUID when the node is referenceable, otherwise the
identifier is the UUID of the nearest referenceable ancestor with the relative path to the node.
Manually adding a property with the name jcr:uuid
to a non referenceable node might have
unexpected effects as Oak maintains an unique index on jcr:uuid
properties. As the namespace
jcr
is reserved, doing so is strongly discouraged.
Namespaces
JCR namespace support is mostly compatible with Jackrabbit 2.x. However, Oak
does not support remapping an existing namespace URI to a different prefix in
the namespace registry. Once registered, such a repository wide namespace prefix
to namespace URI mapping cannot be changed through the namespace registry
anymore. The mapping can be changed on a per session level, but this remapping
is only visible to the current session and bound to the session lifetime. See
Session.setNamespacePrefix(String, String)
.
Versioning
-
Because of the different identifier implementation in Oak, the value of a
jcr:frozenUuid
property on a frozen node will not always be a UUID (see also section about Identifiers). The property reflects the value returned byNode.getIdentifier()
when a node is copied into the version storage as a frozen node. This also means a node restored from a frozen node will only have ajcr:uuid
when it is actually referenceable. -
Oak does currently not implement activities (
OPTION_ACTIVITIES_SUPPORTED
), configurations and baselines (OPTION_BASELINES_SUPPORTED
). -
Oak does currently not implement the various variants of
VersionManager.merge
but throws anUnsupportedRepositoryOperationException
if such a method is called.
Security
Workspaces
An Oak repository only has one default workspace.
MongoDB Document Limit
MongoDB has a document size limit of 16 MB. When using the document storage backend on MongoDB, adding a node with large String properties may fail because their combined size hits this limit. Consider storing large String values as Binary instead. Oak will put those values in the BlobStore, and the document only contains a much smaller reference to the Binary value.
This limitation can also be hit when a node has many orderable child nodes because Oak internally stores the sequence of child node names in a hidden property. See also do's and don'ts.
Session Attributes
Oak exposes the following attributes via Session.getAttribute(...)
and Session.getAttributeNames()
in addition to the ones set through Credentials' attributes passed to Repository.login(…).
Attribute Name | Attribute Value Type | Description |
---|---|---|
oak.refresh-interval |
Long |
The session refresh interval in seconds. |
oak.relaxed-locking |
Boolean |
Whether relaxed locking behaviour is enabled for the session. See OAK-1329. |
oak.bound-principals |
Set<Principal> |
The principals associated with the JCR session. See OAK-9415 |