Title: Query Integration

[TOC]

# OpenCMIS Query Integration

The CMIS standard contains a powerful query language that supports full
text and relational metadata query capabilities and is modeled along a
subset of SQL. Many repositories will have the demand to integrate into
this query interface. OpenCMIS provides support to make a query integration
easier. This article explains the various hooks that are provided to
integrate into the query interface. These hooks provide different levels of
comfort and flexibility. OpenCMIS integrates a query parser that uses ANTLR
as parsing engine. However there is no strong dependency on ANTLR. If you
prefer a different language parsing tool it is possible to do this.

There are four different levels how you can integrate query:

1. Implement query in the discovery service
1. Use the built-in ANTLR and ANTLR CMISQL grammar
1. Use OpenCMIS CMISQL grammar and integrate into ANTLR query walker
1. Use predefined query walker and integrate into interface `PredicateWalker`.

## Implement query in the discovery service

The first way is to implement the `query()` method like any other service
method on your own. This gives you the maximum flexibility including using
a parser tool of your choice and extensions of the query grammar as you
like. This is also the method with the highest implementation effort.

## Use built-in ANTLR and ANTLR CMISQL grammar

OpenCMIS comes with a build-in integration of ANTLR and provides a grammar
file for CMISQL. You can reuse this grammar file, modify or extend it and
integrate query by using the ANTLR mechanisms for parsing and walking the
abstract syntax tree. Please refer to the ANTLR documentation for further
information. This is the right level to use if you need custom parser tree
transformations or would like to extend the grammar with your own
constructs. For demonstration purposes OpenCMIS provides an extended
grammar as an example.

## Use OpenCMIS CMSIQL grammar and integrate into ANTLR query walker

If the standard CMISQL grammar is sufficient for you there is another level
of integration. For many repositories there are common tasks for processing
queries: The columns of the select part need to be evaluated and mapped to
type and property definitions. The from area needs to be mapped to type
definitions and some parts of the where part again refer to properties in
types. In addition all aliases defined in the statement need to be resolved
and many validations are performed. OpenCMIS provides a class that performs
these common tasks. You can make use of the resolved types, properties and
aliases and walk the resulting abstract syntax tree (AST) to evaluate the
query. You are free to walk the AST as many times as you need and in the
order you prefer. The basic idea is that the SELECT and FROM parts are
processed by OpenCMIS and you are responsible for the WHERE part. The
InMemory server provides an example for this level of integration: For
each object contained in the repository the tree is traversed and it's checked
if it matches the current query. You can take the InMemory code as an
example if you decide to use this integration level.

## Use predefined query walker

For some repositories a simple and one-pass query traversal is sufficient.
This can be the case if for example your query needs to be translated to a
SQL query statement. Because ANTLR has some complexity OpenCMIS provides a
predefined walker that performs a simple one pass depth-first traversal. If
this is sufficient this interface hides most of the complexity of ANTLR.
All you have to do is to implement a Java interface
(`PredicateWalker`). You can refer to the InMemory server for example
code (`InMemoryWhereClauseWalker`). 

`AbstractPredicateWalker` implements interface `PredicateWalker` and 
implements common functionality useful for traversing the tree. For example
parsing literals like `"abc"`, `-123` to Java objects like `String` 
and `Integer` is handled there.

If the interface of the predefined walker `PredicateWalker` does not
fit your needs you can define your own interface. The code generated
by ANTLR does not make any assumptions how you design the walking of
your tree. The only dependency is contained in the interface 
`PredicateWalkerBase` consisting of a single method. If you start 
defining your own walker you have to implement or extend `PredicateWalkerBase`.
The unit tests contain an example for this. See class `QueryConditionProcessor`
in the unit tests for the InMemory server.

Note: There is currently no predefined walker for JOIN statements. If
you need to support JOINs you have to build your own walker for this part
as outlined in the previous section.

## Using QueryObject

The class `QueryObject` provides all the basic functionality for resolving
types and properties and performs common validation tasks. The `QueryObject`
processes the `SELECT` and `FROM` parts as well as all property references from
the `WHERE` part. It maintains a list of Java objects and an interface that you
can use to access the property and type definitions given your current
position in the statement. For an example refer to the class
`StoreManagerImpl` of the InMemory Server and method `query()`.
To be able to use this object `QueryObj` needs to get access to the types contained in your
repository. For this purpose you need to pass an interface to a `TypeManager`
as input parameter. Your code will typically look like this:

    :::java
	public class MyWalker extends AbstractPredicateWalker {
                             // extends AbstractPredicateWalker
                             // or implements interface PredicateWalker
							 // or implements interface PredicateWalkerBase
	  // . . .
	}

    TypeManager tm = new MyTypeManager(); // implements interface TypeManager
    MyWalker myWalker = new MyWalker();    
    queryObj = new QueryObject(tm);
    QueryUtil queryUtil = new QueryUtil();

    CmisQueryWalker queryProcessor = queryUtil.traverseStatementAndCatchExc(statement, queryObj, myWalker);


`queryUtil` then will process the statement and call the interface methods of
your walker (Note: This code is in opencmis, you don't have to implement it
yourself.):

    :::java
    try {
        walker = getWalker(statement);
        walker.query(queryObj, pw);
        return walker; 
	} catch (RecognitionException e) {
		String errorMsg = queryObj.getErrorMessage();
		throw new CmisInvalidArgumentException("Walking of statement failed with RecognitionException error: \n   " + errorMsg);
	} catch (CmisBaseException e) {
		throw e;
	} catch (Exception e) {
		throw new CmisInvalidArgumentException("Walking of statement failed with exception: \n   " + e);
    }


After this method returns you may for example ask your walker object
`myWalker` for the generated SQL string.

## Processing a node and referencing types and properties

While traversing the tree you often will need to access the property and
type definitions that are referenced in the where clause. The `QueryObject`
provides the necessary information for resolving the references. For
example the statement

    `... WHERE x < 123`


will result in calling the method `walkLessThan()` in your walker callback
implementation:

    :::java
    public Boolean walkLessThan(Tree ltNode, Tree leftNode, Tree rightNode) {
    
        Object rVal = walkLiteral(rightChild);
        ColumnReference colRef;
    
        CmisSelector sel = queryObj.getColumnReference(columnNode
			     .getTokenStartIndex());
    
        if (null == sel)
           throw new CmisInvalidArgumentException("Unknown property query name " +
		          columnNode.getChild(0));
        else if (sel instanceof ColumnReference)
           colRef = (ColumnReference) sel;
    
       TypeDefinition td = colRef.getTypeDefinition();
       PropertyDefinition pd =
           td.getPropertyDefinitions().get(colRef.getPropertyId());
        
       // process the statement, for example append it to a WHERE
       // in your generated SQL statement.
    }


The right child node is a literal and you will get an Integer object with
value 123. The left node is a reference to a property and
`getColumnReference()` will either give you a function (currently the only
supported function is `SCORE()`) or a reference to a property in a type of
your type system. The query object maintains several maps to resolve
references. The key to the map is always the token index in the incoming
token stream (an integer value). You can get the token index for each node
by calling `getTokenStartIndex()` on the node.

## Building the result list

After processing the query an `ObjectList` has to be returned containing the
requested properties and function results. You can ask the query object for
the requested information:

    :::java
    Map props = queryObj.getRequestedProperties();
    Map funcs = queryObj.getRequestedFuncs();


Key of the map is the query name and value is the alias if an alias was
used in the statement or the query name otherwise.

## Limitations

Currently the query parser does not include the full text search part
of the grammar. Support for JOIN is limited. This will be enhanced in a
future version