org.apache.nutch.clustering
Interface OnlineClusterer

All Superinterfaces:
Pluggable
All Known Implementing Classes:
Clusterer

public interface OnlineClusterer
extends Pluggable

An extension point interface for online search results clustering algorithms.

By the term online search results clustering we will understand a clusterer that works on a set of HitDetails retrieved for a query and able to produce a set of HitsCluster that can be displayed to help the user gain more insight in the topics found in the result.

Other clustering options include predefined categories and off-line preclustered groups, but I do not investigate those any further here.

Version:
$Id: OnlineClusterer.java 823614 2009-10-09 17:02:32Z ab $
Author:
Dawid Weiss

Field Summary
static String X_POINT_ID
          The name of the extension point.
 
Method Summary
 HitsCluster[] clusterHits(HitDetails[] hitDetails, String[] descriptions)
          Clusters an array of hits (HitDetails objects) and their previously extracted summaries (Strings).
 

Field Detail

X_POINT_ID

static final String X_POINT_ID
The name of the extension point.

Method Detail

clusterHits

HitsCluster[] clusterHits(HitDetails[] hitDetails,
                          String[] descriptions)
Clusters an array of hits (HitDetails objects) and their previously extracted summaries (Strings).

Arguments to this method may seem to be very low-level, but in fact they are side products of a regular search process, so we simply reuse them instead of duplicating part of the usual Nutch functionality. Other ideas are welcome.

This method must be thread-safe (many threads may invoke it concurrently on the same instance of a clusterer).

Returns:
A set of HitsCluster objects.


Copyright © 2006 The Apache Software Foundation