I like to move it
This page describes how move operations are implemented in Oak and highlights some of the interesting and potentially surprising side effects it has on memory usage.
JCR Node
and Oak Tree
instances are basic entities how to access
content in a repository. In Jackrabbit Oak the implementation of a JCR Node
is
backed by a MutableTree
that implements Tree
. As the name indicates this
Tree
is mutable and its state can change over time. The following example
illustrates this.
ContentSession s = ...
Root r = s.getLatestRoot();
Tree t = r.getTree("/zoo/marty");
r.move("/zoo/marty", "/madagascar/marty");
String p = t.getPath();
Even though the move operation was not invoked on the Tree
t
directly, the
change will be reflected in the value returned by t.getPath()
. The returned
path will be /madagascar/marty
.
The impact of a move operation on a Tree
instance is evaluated lazily. The
main benefit of this approach is that we don't need to spend time to update
Tree
instances unnecessarily when they are referenced by the heap, but not
used by the application and later garbage collected.
So, how are move operations applied lazily to Tree
instances? Each
MutableTree
object has a pendingMoves
reference to the next Move
it
needs to apply before any read or write operation. Reading a child Tree
passes
on the current pendingMoves
reference to the child. That is, while reading an
entire subtree, all MutableTree
instances will have a pendingMoves
reference
pointing to the same Move
. Initially, the referenced Move
will be empty,
which indicates no move operation happened.
Going back to the example. A slightly simplified state before the move operation looks like the following.
Right after the move operation, the state of MutableTree
instances are still
the same. Only the Move
object referenced by them was modified with information
about the move operation and a new empty Move
object appended via the next
reference.
The effect of a move operation on Tree
objects is applied when a Tree
is accessed, e.g. by calling getPath()
. Whenever a read or write happens, a
MutableTree
will check if there is anything to do via pendingMoves
and apply
an operation when the source path of the move matches its own path. A MutableTree
simply moves on to the next Move
if the source path does not match, until an empty
Move
is reached. This entire process is also done recursively by first applying any
potential move operations on the parent. This ensures hierarchical integrity when
a subtree is moved.
After this read operation the Move
object with the source and destination
information is not referenced anymore by any MutableTree
, and it is eligible
for garbage collection.
This implementation has drawbacks for some usage patterns. A MutableTree
obtained from a ContentSession
and referenced by the application can retain
significant memory when the tree is not accessed while move operations are
performed with the same ContentSession
.
Let's consider an example where Julien stays at the zoo, while the others move to Madagascar.
Tree zoo = r.getTree("/zoo");
Tree madagascar = r.getTree("/madagascar");
Tree julien = zoo.getChild("julien");
List<Tree> move = StreamSupport.stream(target.getChildren().spliterator(), false)
.filter(t -> !t.getName().equals(julien.getName()))
.collect(Collectors.toList());
move.forEach(t -> r.move(t.getPath(), concat(madagascar.getPath(), t.getName())));
move.forEach(t -> System.out.println(t.getName() + " made it to Madagascar"));
At the end the Tree
julien
will have accumulated move information, while all
Tree
instances in the List
have pendingMoves
pointing to the empty Move
.
Please note, memory retained by julien is independent of whether changes are
committed or not through ContentSession.commit()
. The linked Move
objects
referenced by pendingMoves
are the same. Intuition might suggest committing
changes frees memory, but it is not the case in this scenario.
Memory usage in this situation is significant because a Move
object not just
remembers path information but actually references a MutableTree
object as the destination parent and a String
for the name of the moved tree
under the new parent. Behind the scenes, a MutableTree
also references its parent
and a NodeBuilder
to access properties. This also means, moving nodes deep
down will use more memory compared to nodes closer to the root.
It is therefore advisable to release references to Tree
instances as soon as
possible in code that performs many move operations. Alternatively, code should
call a method once in a while on a Tree
it references for a longer period of
time.