This page describes Jena Fastpath query processing.
Each Jena Graph (the low level class behind Jana Models) provides a query handler. The purpose of the query handler is to provide a route for execution of filtered basic graph patterns (a conjunction of triple patterns and also value constraints).
The goal of Jena Fastpath query processing is to perform such filtered basic graph pattern matching for SPARQL (or RDQL) within the database engine. The standard ARQ query processing algorithm is to process each SPARQL query pattern and identify those part that can be dispatched to the graph storage as a single unit. For databases, the Fastpath algorithm generates SQL queries that match multiple patterns yet are portable across a wide range of SQL database engines.
The database-backed model stores RDF triples in a statements table but also in property tables (e.g. for reification statements). The database Fastpath algorithm works by partitioning the patterns and constraints for a query into one or more groups, referred to as stages, where each stage consists of patterns that can be processed together. Stages that consist of more than one pattern are processed by a single, dynamically-generated SQL query. Stages that consist of just one pattern are processed by a Find operation. The Fastpath algorithm also determines a partial execution order for the stages in an attempt to minimize execution time.
In this section, we describe the Fastpath algorithm for the most general case, a model that uses the Standard reification style (i.e., standard RDF treatment of reification, see the Reification-HowTo for a description of the styles). Models that use this style may may contain both asserted statements and reification triples. The Fastpath algorithm is summarized in the pseudo-code below.
The first step is to order the triple patterns by their estimated cost of evaluation. Currently, the cost estimate is a simple measure that ranks patterns with bound variables higher than patterns with unbound variables. Further, unbound variables over predicates are ranked lower than unbound variables over subjects or objects. The highest ranked pattern is then chosen as the first pattern in the next stage of patterns to process.
The next step is to all find patterns that can join with a staged pattern. The process iterates until no more patterns remain or no joins are found. In general, two patterns are considered joinable if they apply to the same database table, have a common variable and neither variable is bound to a predicate. However, in some cases joins of predicate variables are supported as described below. Joins across database tables are somewhat tricky and are deferred to a future release.
The final step is to generate code for the staged patterns. If there is a single pattern in the stage, it is considered more efficient to process the pattern as a find operation. For multiple patterns, a new SQL query is generated dynamically.
The final result of the Fastpath algorithm is a series of stages of query results where each stage is either a find operation or an SQL query. The stages are then evaluated in order, i.e., the output of one stage is used as input for variable bindings in the next stage. A final stage is used for processing the constraints by filtering the final results. Processing constraints within the database is deferred to a future release. Another limitation is that Fastpath processes patterns for different graph separately, i.e., it cannot generate SQL code for joins across graphs. This is also deferred to a future release.
If it is known that a Jena model contains only asserted statements or if a user does not wish to query over reified statements, then variables over predicates can be supported. This affects the algorithm that looks for joins (mentioned above) and should provide a performance benefit for queries that include predicate variables by reducing the number of stages. The RDB Query Handler supports a method, setQueryOnlyAsserted(true), for the user to declare that Fastpath should ignore reified statements in queries. Similarly, if a model contains only reified statements or the user only wishes to query reified statements, then setQueryOnlyReified(true), may be called. However, a current limitation of Fastpath is that if QueryOnlyReified is true then QueryFullReified must also be true.
If a model contains only fully reified statements or the user is only interested in querying fully reified statements (i.e., partially reified statements are to be ignored), then another optimization is possible that eliminates unnecessary joins for reified statements. For example, consider the following query over reified statements.
<Var1, rdf:subject, <Alice>> AND <Var1, rdf:predicate, V2> AND <Var1, rdf:object, V3>
This retrieves all reified statements about Alice. The normal Fastpath algorithm will generate a join for these two patterns. However, we note that fully reified statements are stored in a single database row. So, if the user is only interested in query results for fully reified statements, then the join can be replaced by a simple select. This optimization is enabled by calling setQueryFullReified(true).
To retrieve the current settings for any of these optimization settings, there are corresponding get methods, that return a boolean e.g., getQueryOnlyAsserted(). Note that the values of these options are not persisted in the database (see Options).
Convenient and Minimal reification styles hide reification triples from the Find operation. Consequently, using these reification styles has the same effect as specifying setQueryOnlyAsserted(true).
queryOnlyAsserted
is true. In the future, it should also be
possible to enable predicate variables for some types of filtered basic graph
pattern queries when queryOnlyReified
is true.