org.apache.nutch.segment
Interface SegmentMergeFilter


public interface SegmentMergeFilter

Interface used to filter segments during segment merge. It allows filtering on more sophisticated criteria than just URLs. In particular it allows filtering based on metadata collected while parsing page.


Field Summary
static String X_POINT_ID
          The name of the extension point.
 
Method Summary
 boolean filter(WritableComparable key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
          The filtering method which gets all information being merged for a given key (URL).
 

Field Detail

X_POINT_ID

static final String X_POINT_ID
The name of the extension point.

Method Detail

filter

boolean filter(WritableComparable key,
               CrawlDatum generateData,
               CrawlDatum fetchData,
               CrawlDatum sigData,
               Content content,
               ParseData parseData,
               ParseText parseText,
               Collection<CrawlDatum> linked)
The filtering method which gets all information being merged for a given key (URL).

Returns:
true values for this key (URL) should be merged into the new segment.


Copyright © 2011 The Apache Software Foundation