eZ Component: Search, Design ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Derick Rethans :Revision: $Rev$ :Date: $Date$ .. contents:: Design description ================== The search component provides an interface to allow for multiple search backends. For this to work, abstraction on several levels is required. First of all, the definition of document fields; and secondly the search query syntax. The logic is very similar to that of PersistentObject, where a mapping is made between class properties and database fields. For search a mapping is needed between class properties and search index fields. Finding persistent objects is done through the Database's component SQL abstraction to allow for multiple SQL dialects. The Search component requires something as well to allow for different search query dialects, similarly to what the Database component provides. Therefore the use of the search component will mostly be modeled after the design of the Database and PersistentObject components. Classes ======= ezcSearchSession ---------------- ezcSearchSession is the main runtime interface for indexing and searching documents. Documents can be indexed calling index(), and searching for documents is done through find(). Unlike with the PersistentObject component, find() does not simply return an array of objects for each of the found documents. Instead it returns an ezcSearchResult object containing information about the search result. The find() method accepts as parameter an object of class ezcSearchFindQuery (or one of it's children). This query object is created by calling the createFindQuery() method on this class. Besides createFindQuery(), a method to create a query for deleting indexed documents will be provided too. The classes representing documents need to implement an interface though that specifies getState() and setState() - something we forgot for PersistentObject. ezcSearchSessionInstance ------------------------ Holds search session instances for global access throughout an application. ezcSearchDefinitionManager -------------------------- Loads definition files that describe document types with all their fields. It depends on the backend on how those definitions are mapped to search engine specific fields/options. ezcSearchDocumentDefinition --------------------------- Describes all the fields of one document type. It is loaded by the ezcSearchDefinitionManager and used by the backends to both index and find documents from the search backends. For each document field it stores a ezcSearchObjectProperty. It also defines a field with which a document can be uniquely identified, as well as a default search field. In future versions it could also group fields for easier searching of multiple fields etc. ezcSearchHandler ---------------- The base class that all search backends implement. The handlers now how to communicate to the backends, generate correct search query strings, and how to present results. Handlers can also accept search-backend specific options. For the first version only ezcSearchSolrHandler is planned, while later versions might also have backends for Google, Yahoo! etc. A backend does not have to implement the index(), createDeleteQuery() and delete() methods, as they are not available for every handler. Therefore the search handlers can optionally implement the interface ezcSearchIndexHandler. ezcSearchSolrHandler -------------------- An implementation of ezcSearchHandler that communicates with Apache Lucene/Solr. This will be the reference implementation. ezcSearchQuery -------------- Implements a fluent language to query the search index. The methods are all quite the same as ezcDbQuery. This class is inherited by ezcSearchFindQuery and ezcSearchDeleteQuery for searching in, or deleting from the search index. Data structures =============== ezcSearchObjectProperty Defines the name of the document field, its type and a hint for the field name in the search index. ezcSearchResult Provides meta data about the search (time, number of results, etc.) as well as an array of the found results. Depending on the database backend, the array of found documents can be of different classes, as the document types could be different. Example Usage ============= :: index( $document ); // finding documents where name = Derick $q = $session->createFindQuery(); $q->find( $q->eq( 'name', "Derick" ) ); $ret = $session->find( $q ); // finding documents where any field contains Derick, from row 10 and 7 // rows long $q = $session->createFindQuery(); $q->find( $q->eq( '*', "Derick" ) ); $ret = $session->find( $q )->limit( 7, 10 ); // finding documents where text contains Derick and Tiger, only // having name as returned field, and order by published date. $q = $session->createFindQuery(); $q->select( 'name' ) ->find( $q->and( $q->eq( 'text', "Derick" ), $q->eq( 'text', 'Tiger' ) ) ) ->orderBy( 'published' ); $ret = $session->find( $q ); // finding documents where text contains Derick or Tiger $q = $session->createFindQuery(); $q->find( $q->in( 'text', array( 'Derick', 'Tiger' ) ) ); $ret = $session->find( $q ); // finding documents containing 'Ramius' published between 2007-01-01 and // 2007-12-31 $q = $session->createFindQuery(); $q->find( $q->and( $q->eq( 'text', 'Ramius' ), $q->between( 'published', new DateTime( '2007-01-01' ), // DateTime object strtotime( "2007-12-31" ) // timestamp ) ) ); $ret = $session->find( $q ); // finding documents containing 'plane' and putting facets on the // categories, limiting result set to 8 and facets to 4 $q = $session->createFindQuery(); $q->find( $q->eq( 'description', 'plane' ) ) ->limit( 8 ) ->facet( 'category' )->limit( 4 ); ?> .. Local Variables: mode: rst fill-column: 78 End: vim: et syn=rst tw=79