Status: Experimental

OpenCMIS Query Integration

The CMIS standard contains a powerful query language that supports full text and relational metadata query capabilities and is modeled along a subset of SQL. Many repositories will have the demand to integrate into this query interface. OpenCMIS provides support to make a query integration easier. This article explains the various hooks that are provided to integrate into the query interface. These hooks provide different levels of comfort and flexibility. OpenCMIS integrates a query parser that uses ANTLR as parsing engine. However there is no strong dependency on ANTLR. If you prefer a different language parsing tool it is possible to do this.

There are four different levels how you can integrate query: 1. Implement query in the discovery service 1. Use the built-in ANTLR and ANTLR CMISQL grammar 1. Use OpenCMIS CMISQL grammar and integrate into ANTLR query walker 1. Use predefined query walker and integrate into interface IQueryConditionProcessor.

Implement query in the discovery service

The first way is to implement the query() method like any other service method on your own. This gives you the maximum flexibility including using a parser tool of your choice and extensions of the query grammar as you like. This is also the method with the highest implementation effort.

Use built-in ANTLR and ANTLR CMISQL grammar

OpenCMIS comes with a build-in integration of ANTLR and provides a grammar file for CMISQL. You can reuse this grammar file, modify or extend it and integrate query by using the ANTLR mechanisms for parsing and walking the abstract syntax tree. Please refer to the ANTLR documentation for further information. This is the right level to use if you need custom parser tree transformations or would like to extend the grammar with your own constructs. For demonstration purposes OpenCMIS provides a sample extended grammar as an example.

Use OpenCMIS CMSIQL grammar and integrate into ANTLR query walker

If the standard CMISQL grammar is sufficient for you there is another level of integration. For many repositories there are common tasks for processing queries: The columns of the select part need to be evaluated and mapped to type and property definitions. The from area needs to be mapped to type definitions and some parts of the where part again refer to properties in types. In addition all aliases defined in the statement need to be resolved and many validations are performed. OpenCMIS provides a class that performs these common tasks. You can make use of the resolved types, properties and aliases and walk the resulting abstract syntax tree (AST) to evaluate the query. You are free to walk the AST as many times as you need and in the order you prefer. The basic idea is that the SELECT and FROM parts are processed by OpenCMIS and you are responsible for the WHERE part.  The CMIS InMemory server provides an example for this level of integration: For each object contained in the repository the tree is traversed and checked if it matches the current query. You can take the InMemory code as an example if you decide to use this integration point.

Use predefined query walker

For some repositories a simple and one-pass query traversal is sufficient. This can be the case if for example your query needs to be translated to a SQL query statement. Because ANTLR has some complexity OpenCMIS provides a predefined walker that does a simple one pass depth-first traversal. If this is sufficient this interface hides most of the complexity of ANTLR. All you have to do is to implement a Java interface (IQueryConditionProcessor). You can refer to the unit tests for example code. The class TestQueryProcessor nested in the unit test ProcessQueryTest provides an example of such a walker. Some utility methods like for example parsing literals like "abc", -123 to Java objects like String and Integer are common tasks. Therefore this is implemented in an abstract class AbstractQueryConditionProcessor. This declares all interface methods as abstract and provides default implementations for common tasks. In most cases you will derive your implementation from AbstractQueryConditionProcessor and not directly implement the interface.

Note: There is currently no predefined walker for the JOIN statements. If you need to support JOINS you have to build your own walker for this part as outlined in the previous section.

Using QueryObject

The class QueryObject provides all the basic functionality for resolving types and properties and performs common validation tasks. The QueryObject processes the SELECT and FROM parts as well as all property references from the WHERE part. It maintains a list of Java objects and interface that you can use to access the property and type definitions given your current position in the statement. For an example refer to the class StoreManagerImpl of the InMemory Server and method query(). To be able to use this object QueryObj needs to get access to the types contained in your repository. For this purpose you need to pass an interface to a TypeManager as input parameter. The second parameter is your query walker implementing IQueryConditionProcessor. Your code will typically look like this:

TypeManager tm = new MyTypeManager(); // implements interface TypeManager

IQueryConditionProcessor myWalker = new MyWalker(); // implements interface
    IQueryConditionProcessor or extends AbstractQueryConditionProcessor

queryObj = new QueryObject(tm, myWalker);

queryObj then will process the statement and call the interface methods of your walker:

try {

    CmisQueryWalker walker = QueryObject.getWalker(statement);
    walker.query(queryObj);

} catch (RecognitionException e) {

    throw new RuntimeException("Walking of statement failed with
         RecognitionException error: \n   " + e);

} catch (Exception e) {
    throw new RuntimeException("Walking of statement failed with exception:
          \n   " + e);
}

After this method returns you may for example ask your walker object myWalker for the generated SQL string.

Processing a node and referencing types and properties

While traversing the tree you often will need to access the property and type definitions that are referenced in the where clause. The QueryObject provides the necessary information for resolving the references. For example the statement

... WHERE x < 123

will result in calling the method onLessThan() in your walker callback implementation:

public void onLessThan(Tree ltNode, Tree leftNode, Tree rightNode) {

    Object rVal = onLiteral(rightChild);
    ColumnReference colRef;

    CmisSelector sel = queryObj.getColumnReference(columnNode
             .getTokenStartIndex());

    if (null == sel)
       throw new RuntimeException("Unknown property query name " +
              columnNode.getChild(0));
    else if (sel instanceof ColumnReference)
       colRef = (ColumnReference) sel;

   TypeDefinition td = colRef.getTypeDefinition();
   PropertyDefinition pd =
       td.getPropertyDefinitions().get(colRef.getPropertyId());

   //  process the statement, for example append it to a WHERE
   // in your generated SQL statement.
}

The right child node is a literal and you will get an Integer object with value 123. The left node is a reference to property and getColumnReference() will either give you a function (currently the only supported function is SCORE()) or a reference to a property in a type of your type system. The query object maintains several maps to resolve references. The key to the map is always the token index in the incoming token stream (an integer value). You can get the token index for each node by calling getTokenStartIndex() on the node.

Building the result list

After processing the query an ObjectList has to be returned containing the requested properties and function results. You can ask the query object for the requested information:

Map props = queryObj.getRequestedProperties();
Map funcs = queryObj.getRequestedFuncs();

key of the map is the query name and value is the alias if an alias was used in the statement or the query name otherwise.