BasicIndexingFilter (apache-nutch 2.2.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.nutch.indexer.basic
Class BasicIndexingFilter

java.lang.Object
  org.apache.nutch.indexer.basic.BasicIndexingFilter

All Implemented Interfaces:: org.apache.hadoop.conf.Configurable, IndexingFilter, FieldPluggable, Pluggable

public class BasicIndexingFilter
extends Object
implements IndexingFilter
extends Object
implements IndexingFilter

Adds basic searchable fields to a document. The fields are: host - add host as un-stored, indexed and tokenized url - url is both stored and indexed, so it's both searchable and returned. This is also a required field. orig - also store original url as both stored and indexed content - content is indexed, so that it's searchable, but not stored in index title - title is stored and indexed cache - add cached content/summary display policy, if available tstamp - add timestamp when fetched, for deduplication

Field Summary
`static org.slf4j.Logger`	`LOG`

Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
`X_POINT_ID`

Constructor Summary
`BasicIndexingFilter()`

Method Summary
`void`	`addIndexBackendOptions(org.apache.hadoop.conf.Configuration conf)`
`NutchDocument`	`filter(NutchDocument doc, String url, WebPage page)` The `BasicIndexingFilter` filter object which supports boolean configurable value for length of characters permitted within the title @see `indexer.max.title.length` in nutch-default.xml
`org.apache.hadoop.conf.Configuration`	`getConf()` Get the `Configuration` object
`Collection<WebPage.Field>`	`getFields()` Gets all the fields for a given `WebPage` Many datastores need to setup the mapreduce job by specifying the fields needed.
`void`	`setConf(org.apache.hadoop.conf.Configuration conf)` Set the `Configuration` object

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

LOG

public static final org.slf4j.Logger LOG

Constructor Detail