"CRIS is based on Apache Lucene [1] and provides means to index RDF resources. It works by indexing the values of properties on a resource. This enables to search for the property values using CRIS. The results that CRIS delivers are the corresponding RDF resources."^^ . . "

The core of CRIS is the GraphIndexer class. Note that GraphIndexer is not an OSGi service, but it has to be instantiated by the user to provide an index. The GraphIndexer needs two graphs to work with. One graph contains the IndexDefinitions, that is the specification of which resources and properties to index (see IndexDefinitionManager). The other graph is the the graph that contains the resources to index. Note that CRIS indexes RDF resources based on their rdf:type and that the indexing works on a per-property basis. That means, not all properties on a resource are indexed by default. The user has to specify which properties to index.

\n

GraphIndexer also provides the interface to search for resources using the findResources method. The search is specified using Conditions and optionally a SortSpecification and FacetCollectors. The findResources method is overloaded with methods that allow the specification of the resource type and search query directly.

"^^ . . "GraphIndexer"^^ . . _:a5ac075bbf86e4094bbd63c689b9e65d1 . _:ef06e9181f30ac45dc70161fe27e4bf91 . . "[1] Apache Lucene - Overview\n

\n[2] Apache Lucene - Query Parser Syntax"^^ . . "References"^^ . . _:94a8a417ddf64bae28b1f4ab642f26771 . _:aae2cefa34e0e85c36b2d2b995b07a261 . . "

The IndexDefinitionManager helps to manage indexing specifications using the CRIS ontology in the index definition graph (see GraphIndexer). Indexing is enabled for resources according to their rdf:type. Additionally the index definitions specify the properties of the resource that are indexed.

\n\n

One can think of an index definition as specifying the keys (properties) that are mapped to the value (the resource URI) in the index.

"^^ . . "IndexDefinitionManager"^^ . . _:1a97d84af7413940e27729f9a455be231 . _:d0c5d58c3d389e58633762e0413fcf6c1 . . "

VirtualProperties represent RDF properties but they add functionality required to index them. There are several types of VirtualProperties that can be used for specific indexing requirements.

\n
    \n
  • PropertyHolder: This is a VirtualProperty-adapter for a single RDF property.
  • \n
  • JoinVirtualProperty: This creates a virtual property which represents several RDF properties and the object is a literal that concatenates the objects of the specified RDF properties into a single literal separated by a space (e.g. this can represent a concatenation of foaf:firstName and foaf:lastName. If the first name is \u201CJohn\u201D and the last name is \u201CDoe\u201D, the value of the JoinVirtualProperty will be \u201CJohn Doe\u201D, note that the order of concatenation matters. If it is switched the value will be \u201CDoe John\u201D).
  • \n
  • PathVirtualProperty: This can be used to index properties that are not attached directly to the indexed resouce. For example one can index a property value that is attached to a blank node, if the blank node is in turn attached to the indexed resource.
  • \n
"^^ . . "VirtualProperty"^^ . . _:79eba27988f5700159925f00d187a8a61 . _:916cbe9e1c35f5d35cb11061dfc8a5a51 . . "

Conditions represent the type of query to perform on the index. When searching for resources, a list of conditions can be supplied. These conditions are applied using a boolean and relationship between them. Currently there are the following Conditions implemented.

\n\n
    \n
  • WildcardCondition: Takes a string query as input. The query can contain wild-card characters (* for any number of characters, ? for a single character). The WildcardCondition has to be specified for a single property. When querying multiple properties, multiple WildcardConditions have to be used. Note that the property has to be indexed.
  • \n
  • TermRangeCondition: Returns results that lie between (according to String.compareTo) a lower and an upper term specified by the user. The TermRangeCondition has to be specified for a single property. When querying multiple properties, multiple TermRangeCondition have to be used. Note that the property has to be indexed.
  • \n
  • GenericCondition: This condition supports full Lucene query parser syntax [2]. It supports querying on multiple properties directly. Note that all supplied properties have to be indexed.
  • \n
\n"^^ . . "Conditions"^^ . . _:11a602df877d16de3f36f5de140dc6ff1 . _:2a1425ca3e32be21f2ffa4d7e5e470571 . . "

A SortSpecification contains an ordered list of VirtualProperties. The order of addition defines the significance for sorting (all search results are sorted according to the first property, then according to the second property, etc). Note, that the added properties have to be indexed.

Example:

\n\n
Unsorted results: {myuri#adam_berkley, myuri#berta_adams}
\n
SortSpecification: {foaf:lastName, foaf:firstName}
\n
Sorted results: {myuri#berta_adams, myuri#adam_berkley}
\n\n

In order to specify sorting by indexing order (the order in which the resources have been added to the index) or according to relevance (document score computed by Lucene), there are two special objects: SortSpecification.INDEX_ORDER and SortSpecification.RELEVANCE. These objects can be added to the SortSpecification like VirtualProperties. A usage example could be to use INDEX_ORDER as the last entry in the list to break ties in a well specified manner. If two resources are sorted equally, it is undefined which resource is displayed first. When the final step is to order by INDEX_ORDER, then the order in this case is defined by the time the resource in question has been indexed and it will be guaranteed to be the same each time.

\n\n

Besides the order of properties, the user has to specify how to interpret the value of the property in order to receive expected results. The supported value types are defined as static constants on SortSpecification. The most commonly used value type is STRING. This type interprets the literal values as a String for sortingand returns results according to their \u201Cnatural order\u201D. There is a STRING_COMPARETO constant as well but this is much more resource intensive. It uses the String.compareTo method for sorting. Only use this in case STRING does not return expected results. Other useful types are INT and FLOAT.

\n\n

By default the sorting is in ascending order. When properties are added it to the SortSpecification it is possible to specify that the order should be reverse (descending).

"^^ . . "SortSpecification (sorting search results)"^^ . . _:3c9b8ac1955e215d24b36828f81e0db51 . _:ce66a41401c9882cbb0c8270b117bcdc1 . . "

A FacetCollector can be supplied to perform a faceted search. It works on a per-property basis.The user can add properties for which the FacetCollectors collects facets. That means it groups certain information according to the values of the specified property. The information collected per facet is depending on the FacetCollector implementation.

\n\n

An example of facets is:

\n
Property to collect facets for: foaf:firstName
\n
Search Results: {myuri#adam_berkley, myuri#berta_adams, myuri#adam_hawk}
\n
Facets: {\u201Cadam\u201D, \u201Cberta\u201D}
\n\n

A facet collector can collect facets for multiple properties. Facets are available as sets of entries. The entry has a key (e.g \u201Cberta\u201D) and a value (the information collected for the berta facet).

\n\n

Currently two basic FacetCollectors are implemented:

\n
    \n
  • CountFacetCollector: Counts the number of occurrences of a facet as the information associated to it (for the example above: {\u201Cadam\u201D, 2}, {\u201Cberta\u201D, 1}).
  • \n
  • SortedCountFacetCollector: The same as CountFacetCollector but it returns the facets ordered by value.
  • \n
"^^ . . "FacetCollector (faceted search)"^^ . . _:406a232fe30372afa30cc9fcd61eba681 . _:e4d6f3af0702f4cae0af1b8a614a04e61 . . "
\nMGraph definitions = new SimpleMGraph(); //indexing specifications\nMGraph dataGraph = new SimpleMGraph(); //the graph to index\n\n//adding index definitions:\nIndexDefinitionManager indexMgr = new IndexDefinitionManager(definitions);\n\n// index firstName + \u201C \u201C + lastName;\nList<VirtualProperty> predicates = new ArrayList<VirtualProperty>();\npredicates.add(new PropertyHolder(FOAF.firstName));\npredicates.add(new PropertyHolder(FOAF.lastName));\nJoinVirtualProperty name = new JoinVirtualProperty(predicates);\n\n// index the value on the path res/vcard:adr/vcard:street:address\nList<VirtualProperty> path = new ArrayList<VirtualProperty>();\npath.add(VCARD.adr);\npath.add(VCARD.street_address);\nPathVirtualProperty streetAddress = new PathVirtualProperty(path);\n\nList<VirtualProperty> properties = new ArrayList<VirtualProperty>();\nproperties.add(new PropertyHolder(FOAF.mbox)); //index the value of foaf:mbox\nproperties.add(name);\nproperties.add(streetAddress);\n\nindexMgr.addDefinition(FOAF.Person, properties); //index resources with rdf:type foaf:Person\n\n//create index\nGraphIndexer service = new GraphIndexer(definitions, dataGraph); //creates the index in memory\n\n//add more data to dataGraph if necessary\n\n//sort specification\nSortSpecification sortSpecification = new SortSpecification();\nsortSpecification.add(name, SortSpecification.STRING); //sort by name\n\n//faceted search\nSortedCountFacetCollector facetCollector = new SortedCountFacetCollector(); //count occurrence of same value\nfacetCollector.addFacetProperty(streetAddress); //count street address values\n\n//search for name\nList<NonLiteral> results = service.findResources(name, \"John D*\", false, sortSpecification, facetCollector);\n//results contains resources with names such as \u201CJohn Dalton\u201D, \u201CJohn Doe\u201D, etc. sorted by name \n\nfacetCollector.getFacets(streetAddress); //contains counts of how often the same streetAddress occurs\n\nservice.closeLuceneIndex(); //release reources\n
"^^ . . "CRIS Usage"^^ . . _:8373cdf0524928d6cc7431a7dd7febf61 . _:c32ca2bd32a5b9c08004f749b4b7357c1 . . _:1dd84f70e52e470200dee5ffe4822dc71 . _:3389fb62506cc3bbddaa0e7c67958e841 . _:386f24369b154114296a2f27a92275f41 . _:4b61632098e9630978a63b9aca4d9ec31 . _:6d003ad2f3a82701b2ed0acb2f780ebd1 . _:82da6471ef665a035aed16f3b5c7b35e1 . _:aba225d8827982227f44fcf3ba95d0c01 . _:d268f287b117dc46ca94ee4f5726a45e1 . _:ea18fa9350854187dd5062c91c4e715d1 . . "Composite Resource Indexing Service (CRIS)"^^ . . _:6e16f54b18d53adc14e0d3bceadb17ed1 . _:7c86efa270e5b780df2f5beafe2013251 . . _:11a602df877d16de3f36f5de140dc6ff1 . _:11a602df877d16de3f36f5de140dc6ff1 "1" . _:11a602df877d16de3f36f5de140dc6ff1 . _:1a97d84af7413940e27729f9a455be231 . _:1a97d84af7413940e27729f9a455be231 "0" . _:1a97d84af7413940e27729f9a455be231 . _:1dd84f70e52e470200dee5ffe4822dc71 . _:1dd84f70e52e470200dee5ffe4822dc71 "8" . _:1dd84f70e52e470200dee5ffe4822dc71 . _:2a1425ca3e32be21f2ffa4d7e5e470571 . _:2a1425ca3e32be21f2ffa4d7e5e470571 "0" . _:2a1425ca3e32be21f2ffa4d7e5e470571 . _:3389fb62506cc3bbddaa0e7c67958e841 . _:3389fb62506cc3bbddaa0e7c67958e841 "5" . _:3389fb62506cc3bbddaa0e7c67958e841 . _:386f24369b154114296a2f27a92275f41 . _:386f24369b154114296a2f27a92275f41 "2" . _:386f24369b154114296a2f27a92275f41 . _:3c9b8ac1955e215d24b36828f81e0db51 . _:3c9b8ac1955e215d24b36828f81e0db51 "0" . _:3c9b8ac1955e215d24b36828f81e0db51 . _:406a232fe30372afa30cc9fcd61eba681 . _:406a232fe30372afa30cc9fcd61eba681 "0" . _:406a232fe30372afa30cc9fcd61eba681 . _:4b61632098e9630978a63b9aca4d9ec31 . _:4b61632098e9630978a63b9aca4d9ec31 "4" . _:4b61632098e9630978a63b9aca4d9ec31 . _:6d003ad2f3a82701b2ed0acb2f780ebd1 . _:6d003ad2f3a82701b2ed0acb2f780ebd1 "3" . _:6d003ad2f3a82701b2ed0acb2f780ebd1 . _:6e16f54b18d53adc14e0d3bceadb17ed1 . _:6e16f54b18d53adc14e0d3bceadb17ed1 "1" . _:6e16f54b18d53adc14e0d3bceadb17ed1 . _:79eba27988f5700159925f00d187a8a61 . _:79eba27988f5700159925f00d187a8a61 "0" . _:79eba27988f5700159925f00d187a8a61 . _:7c86efa270e5b780df2f5beafe2013251 . _:7c86efa270e5b780df2f5beafe2013251 "0" . _:7c86efa270e5b780df2f5beafe2013251 . _:82da6471ef665a035aed16f3b5c7b35e1 . _:82da6471ef665a035aed16f3b5c7b35e1 "7" . _:82da6471ef665a035aed16f3b5c7b35e1 . _:8373cdf0524928d6cc7431a7dd7febf61 . _:8373cdf0524928d6cc7431a7dd7febf61 "0" . _:8373cdf0524928d6cc7431a7dd7febf61 . _:916cbe9e1c35f5d35cb11061dfc8a5a51 . _:916cbe9e1c35f5d35cb11061dfc8a5a51 "1" . _:916cbe9e1c35f5d35cb11061dfc8a5a51 . _:94a8a417ddf64bae28b1f4ab642f26771 . _:94a8a417ddf64bae28b1f4ab642f26771 "1" . _:94a8a417ddf64bae28b1f4ab642f26771 . _:a5ac075bbf86e4094bbd63c689b9e65d1 . _:a5ac075bbf86e4094bbd63c689b9e65d1 "1" . _:a5ac075bbf86e4094bbd63c689b9e65d1 . _:aae2cefa34e0e85c36b2d2b995b07a261 . _:aae2cefa34e0e85c36b2d2b995b07a261 "0" . _:aae2cefa34e0e85c36b2d2b995b07a261 . _:aba225d8827982227f44fcf3ba95d0c01 . _:aba225d8827982227f44fcf3ba95d0c01 "0" . _:aba225d8827982227f44fcf3ba95d0c01 . _:c32ca2bd32a5b9c08004f749b4b7357c1 . _:c32ca2bd32a5b9c08004f749b4b7357c1 "1" . _:c32ca2bd32a5b9c08004f749b4b7357c1 . _:ce66a41401c9882cbb0c8270b117bcdc1 . _:ce66a41401c9882cbb0c8270b117bcdc1 "1" . _:ce66a41401c9882cbb0c8270b117bcdc1 . _:d0c5d58c3d389e58633762e0413fcf6c1 . _:d0c5d58c3d389e58633762e0413fcf6c1 "1" . _:d0c5d58c3d389e58633762e0413fcf6c1 . _:d268f287b117dc46ca94ee4f5726a45e1 . _:d268f287b117dc46ca94ee4f5726a45e1 "6" . _:d268f287b117dc46ca94ee4f5726a45e1 . _:e4d6f3af0702f4cae0af1b8a614a04e61 . _:e4d6f3af0702f4cae0af1b8a614a04e61 "1" . _:e4d6f3af0702f4cae0af1b8a614a04e61 . _:ea18fa9350854187dd5062c91c4e715d1 . _:ea18fa9350854187dd5062c91c4e715d1 "1" . _:ea18fa9350854187dd5062c91c4e715d1 . _:ef06e9181f30ac45dc70161fe27e4bf91 . _:ef06e9181f30ac45dc70161fe27e4bf91 "0" . _:ef06e9181f30ac45dc70161fe27e4bf91 .