AnchorIndexingFilter (apache-nutch 1.6 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.nutch.indexer.anchor
Class AnchorIndexingFilter

java.lang.Object
  org.apache.nutch.indexer.anchor.AnchorIndexingFilter

All Implemented Interfaces:: Configurable, IndexingFilter, Pluggable

public class AnchorIndexingFilter
extends Object
implements IndexingFilter
extends Object
implements IndexingFilter

Indexing filter that offers an option to either index all inbound anchor text for a document or deduplicate anchors. Deduplication does have it's con's,

See Also:: anchorIndexingFilter.deduplicate} in nutch-default.xml.

Field Summary
`static org.slf4j.Logger`	`LOG`

Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
`X_POINT_ID`

Constructor Summary
`AnchorIndexingFilter()`

Method Summary
`NutchDocument`	`filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)` The `AnchorIndexingFilter` filter object which supports boolean configuration settings for the deduplication of anchors.
`Configuration`	`getConf()` Get the `Configuration` object
`void`	`setConf(Configuration conf)` Set the `Configuration` object

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

AnchorIndexingFilter

public AnchorIndexingFilter()

Method Detail

setConf

public void setConf(Configuration conf)

Set the Configuration object

Specified by:: setConf in interface Configurable

getConf

public Configuration getConf()

Get the Configuration object

Specified by:: getConf in interface Configurable

filter

public NutchDocument filter(NutchDocument doc,
                            Parse parse,
                            Text url,
                            CrawlDatum datum,
                            Inlinks inlinks)
                     throws IndexingException

The AnchorIndexingFilter filter object which supports boolean configuration settings for the deduplication of anchors. See anchorIndexingFilter.deduplicate in nutch-default.xml.

Specified by:: filter in interface IndexingFilter

Parameters:: doc - The NutchDocument object; parse - The relevant Parse object passing through the filter; url - URL to be filtered for anchor text; datum - The CrawlDatum entry; inlinks - The Inlinks containing anchor text
Returns:: filtered NutchDocument
Throws:: IndexingException