org.apache.nutch.segment
Interface SegmentMergeFilter
public interface SegmentMergeFilter
Interface used to filter segments during segment merge. It allows filtering
on more sophisticated criteria than just URLs. In particular it allows
filtering based on metadata collected while parsing page.
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
boolean filter(WritableComparable key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
- The filtering method which gets all information being merged for a given
key (URL).
- Returns:
- true values for this key (URL) should be merged
into the new segment.
Copyright © 2011 The Apache Software Foundation