T
- The top most Java cover class (usually a JCas Class) specified for the underlying index.public interface AnnotationIndex<T extends AnnotationFS> extends FSIndex<T>
uima.tcas.Annotation
(or its subtypes). You can obtain an AnnotationIndex by
calling:
AnnotationIndex idx = cas.getAnnotationIndex();
or
AnnotationIndex<SomeJCasType> idx = jcas.getAnnotationIndex(SomeJCasType.class);
Note that the AnnotationIndex defines the following sort order between two annotations:
a.start < b.start
then a < b
.a.start = b.start
and a.end > b.end
, then
a < b
. This causes annotations with larger spans to be sorted before annotations
with smaller spans, which produces an iteration order similar to a preorder tree traversal.TypePriorities
if type priorities are specified. Type
Priorities specification is an optional element of the component descriptor). When type
priorities are in use, if a.start = b.start
, a.end = b.end
, and the
type of a
is defined before the type of b
in the type priorities, then
a < b
.
In the method descriptions below, the notation a < b
, where a
and
b
are annotations, should be taken to mean a
comes before
b
in the index, according to the above rules.
BAG_INDEX, DEFAULT_BAG_INDEX, SET_INDEX, SORTED_INDEX
Modifier and Type | Method and Description |
---|---|
FSIterator<T> |
iterator(boolean ambiguous)
Return an iterator over annotations that can be constrained to be unambiguous.
|
FSIterator<T> |
subiterator(AnnotationFS annot)
Return a subiterator whose bounds are defined by the input annotation.
|
FSIterator<T> |
subiterator(AnnotationFS annot,
boolean ambiguous,
boolean strict)
Return a subiterator whose bounds are defined by the
annot . |
AnnotationTree<T> |
tree(T annot)
Create an annotation tree with
annot as root node. |
compare, contains, find, getIndexingStrategy, getType, iterator, iterator, select, select, select, select, select, size, stream, subType, subType, withSnapshotIterators
add, addAll, clear, contains, containsAll, equals, hashCode, isEmpty, parallelStream, remove, removeAll, removeIf, retainAll, spliterator, toArray, toArray
FSIterator<T> iterator(boolean ambiguous)
A disambiguated iterator is defined as follows. The first annotation returned is the same as
would be returned by the corresponding ambiguous iterator. If the unambiguous iterator has
returned a
previously, it will next return the smallest b
s.t. a <
b and a.getEnd() <= b.getBegin(). In other words, the b
annotation's start will
be large enough to not overlap the span of a
.
An unambiguous iterator makes a snapshot copy of the index containing just the disambiguated items, and iterates over that. It doesn't check for concurrent index modifications (the ambiguous iterator does check for this).
ambiguous
- If set to false, iterator will be unambiguous.FSIterator<T> subiterator(AnnotationFS annot)
The annot
is used for 3 purposes:
The starting position is computed by first finding a position whose annotation compares equal
with the annot
(this might be one of several), and then advancing until reaching a
position where the annotation there is not equal to the annot
. If no item in the
index is equal (meaning it has the same begin, the same end, and is the same type as the
annot
) then the iterator is positioned to the first annotation which is greater
than the annot
, or if there are no annotations greater than the
annot
, the iterator is marked invalid.
The iterator will stop (become invalid) when
annot's
end position, or
While iterating, it operates like a strict
iterator; annotations whose end
positions are > the end position of annot
are skipped.
This is equivalent to returning annotations b
such that
annot < b
, andannot.getEnd() >= b.getBegin()
, skipping b's
whose end
position is > annot.getEnd().
For annotations x, y, x < y
here is to be interpreted as "x comes before y in
the index", according to the rules defined in the description of this
class
.
This definition implies that annotations b
that have the same span as
annot
may or may not be returned by the subiterator. This is determined by the
type priorities; the subiterator will only return such an annotation b
if the type
of annot
precedes the type of b
in the type priorities definition. If
you have not specified the priority, or if annot
and b
are of the
same type, then the behavior is undefined.
For example, if you have an annotation S
of type Sentence
and an
annotation P
of type Paragraph
that have the same span, and you have
defined Paragraph
before Sentence
in your type priorities, then
subiterator(P)
will give you an iterator that will return S
, but
subiterator(S)
will give you an iterator that will NOT return P
. The
intuition is that a Paragraph is conceptually larger than a Sentence, as defined by the type
priorities.
Calling subiterator(a)
is equivalent to calling
subiterator(a, true, true).
. See
subiterator(AnnotationFS, boolean,
boolean)
.
annot
- Defines the boundaries of the subiterator.FSIterator<T> subiterator(AnnotationFS annot, boolean ambiguous, boolean strict)
annot
.
The annot
is used in 2 or 3 ways.
strict
is specified, the end point also specifies which annotations will be
skipped while iterating.
The starting position is computed by first finding the position whose annotation compares equal
with the annot
, and then advancing until reaching a position where the annotation
there is not equal to the annot
. If no item in the index is equal (meaning it has
the same begin, the same end, and is the same type as the annot
) then the iterator
is positioned to the first annotation which is greater than the annot
, or if there
are no annotations greater than the annot
, the iterator is marked invalid.
The iterator will stop (become invalid) when
annot's
end position, or
Ignoring strict
and ambiguous
for a moment, this is equivalent to
returning annotations b
such that
annot < b
using the standard annotation comparator, andannot.getEnd() >= b.getBegin()
, and also bounded by the index itself.
A strict
subiterator skips annotations where
annot.getEnd() < b.getEnd()
.
A ambiguous = false
specification produces an unambigouse iterator, which computes
a subset of the annotations, going forward, such that annotations whose begin
is
contained within the previous returned annotation's span, are skipped.
For annotations x,y, x < y
here is to be interpreted as "x comes before y in
the index", according to the rules defined in the description of this
class
.
If strict = true
then annotations whose end is > annot.getEnd()
are skipped.
These definitions imply that annotations b
that have the same span as
annot
may or may not be returned by the subiterator. This is determined by the
type priorities; the subiterator will only return such an annotation b
if the type
of annot
precedes the type of b
in the type priorities definition. If
you have not specified the priority, or if annot
and b
are of the
same type, then the behavior is undefined.
For example, if you have an annotation S
of type Sentence
and an
annotation P
of type Paragraph
that have the same span, and you have
defined Paragraph
before Sentence
in your type priorities, then
subiterator(P)
will give you an iterator that will return S
, but
subiterator(S)
will give you an iterator that will NOT return P
. The
intuition is that a Paragraph is conceptually larger than a Sentence, as defined by the type
priorities.
annot
- Annotation setting boundary conditions for subiterator.ambiguous
- If set to false
, resulting iterator will be unambiguous.strict
- Controls if annotations that overlap to the right are considered in or out.AnnotationTree<T> tree(T annot)
annot
as root node. The tree is defined as follows:
for each node in the tree, the children are the sequence of annotations that would be obtained
from a strict, unambiguous subiterator of the node's annotation.annot
- The annotation at the root of the tree. This must be of type T or a subtypeannot
.Copyright © 2006–2022 The Apache Software Foundation. All rights reserved.