|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use NutchDocument | |
---|---|
org.apache.nutch.analysis.lang | Text document language identifier. |
org.apache.nutch.indexer | Maintain Lucene full-text indexes. |
org.apache.nutch.indexer.anchor | An indexing plugin for inbound anchor text. |
org.apache.nutch.indexer.basic | A basic indexing plugin. |
org.apache.nutch.indexer.elastic | |
org.apache.nutch.indexer.feed | |
org.apache.nutch.indexer.more | A more indexing plugin. |
org.apache.nutch.indexer.solr | |
org.apache.nutch.indexer.subcollection | |
org.apache.nutch.indexer.tld | Top Level Domain Indexing plugin. |
org.apache.nutch.microformats.reltag | A microformats Rel-Tag Parser/Indexer/Querier plugin. |
org.apache.nutch.scoring | |
org.apache.nutch.scoring.link | |
org.apache.nutch.scoring.opic | |
org.apache.nutch.scoring.tld | Top Level Domain Scoring plugin. |
org.creativecommons.nutch | Sample plugins that parse and index Creative Commons medadata. |
Uses of NutchDocument in org.apache.nutch.analysis.lang |
---|
Methods in org.apache.nutch.analysis.lang that return NutchDocument | |
---|---|
NutchDocument |
LanguageIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Methods in org.apache.nutch.analysis.lang with parameters of type NutchDocument | |
---|---|
NutchDocument |
LanguageIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Uses of NutchDocument in org.apache.nutch.indexer |
---|
Methods in org.apache.nutch.indexer that return NutchDocument | |
---|---|
NutchDocument |
IndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
NutchDocument |
IndexingFilters.filter(NutchDocument doc,
String url,
WebPage page)
Run all defined filters. |
NutchDocument |
IndexUtil.index(String key,
WebPage page)
Index a webpage. |
Methods in org.apache.nutch.indexer that return types with arguments of type NutchDocument | |
---|---|
RecordWriter<String,NutchDocument> |
IndexerOutputFormat.getRecordWriter(TaskAttemptContext job)
|
Methods in org.apache.nutch.indexer with parameters of type NutchDocument | |
---|---|
NutchDocument |
IndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
NutchDocument |
IndexingFilters.filter(NutchDocument doc,
String url,
WebPage page)
Run all defined filters. |
void |
NutchIndexWriter.write(NutchDocument doc)
|
Uses of NutchDocument in org.apache.nutch.indexer.anchor |
---|
Methods in org.apache.nutch.indexer.anchor that return NutchDocument | |
---|---|
NutchDocument |
AnchorIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
The AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors. |
Methods in org.apache.nutch.indexer.anchor with parameters of type NutchDocument | |
---|---|
NutchDocument |
AnchorIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
The AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors. |
Uses of NutchDocument in org.apache.nutch.indexer.basic |
---|
Methods in org.apache.nutch.indexer.basic that return NutchDocument | |
---|---|
NutchDocument |
BasicIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
The BasicIndexingFilter filter object which supports boolean
configurable value for length of characters permitted within the
title @see indexer.max.title.length in nutch-default.xml |
Methods in org.apache.nutch.indexer.basic with parameters of type NutchDocument | |
---|---|
NutchDocument |
BasicIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
The BasicIndexingFilter filter object which supports boolean
configurable value for length of characters permitted within the
title @see indexer.max.title.length in nutch-default.xml |
Uses of NutchDocument in org.apache.nutch.indexer.elastic |
---|
Methods in org.apache.nutch.indexer.elastic with parameters of type NutchDocument | |
---|---|
void |
ElasticWriter.write(NutchDocument doc)
|
Uses of NutchDocument in org.apache.nutch.indexer.feed |
---|
Methods in org.apache.nutch.indexer.feed that return NutchDocument | |
---|---|
NutchDocument |
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch
index. |
Methods in org.apache.nutch.indexer.feed with parameters of type NutchDocument | |
---|---|
NutchDocument |
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch
index. |
Uses of NutchDocument in org.apache.nutch.indexer.more |
---|
Methods in org.apache.nutch.indexer.more that return NutchDocument | |
---|---|
NutchDocument |
MoreIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Methods in org.apache.nutch.indexer.more with parameters of type NutchDocument | |
---|---|
NutchDocument |
MoreIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Uses of NutchDocument in org.apache.nutch.indexer.solr |
---|
Methods in org.apache.nutch.indexer.solr with parameters of type NutchDocument | |
---|---|
void |
SolrWriter.write(NutchDocument doc)
|
Uses of NutchDocument in org.apache.nutch.indexer.subcollection |
---|
Methods in org.apache.nutch.indexer.subcollection that return NutchDocument | |
---|---|
NutchDocument |
SubcollectionIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Methods in org.apache.nutch.indexer.subcollection with parameters of type NutchDocument | |
---|---|
NutchDocument |
SubcollectionIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Uses of NutchDocument in org.apache.nutch.indexer.tld |
---|
Methods in org.apache.nutch.indexer.tld that return NutchDocument | |
---|---|
NutchDocument |
TLDIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Methods in org.apache.nutch.indexer.tld with parameters of type NutchDocument | |
---|---|
NutchDocument |
TLDIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Uses of NutchDocument in org.apache.nutch.microformats.reltag |
---|
Methods in org.apache.nutch.microformats.reltag that return NutchDocument | |
---|---|
NutchDocument |
RelTagIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
The RelTagIndexingFilter filter object. |
Methods in org.apache.nutch.microformats.reltag with parameters of type NutchDocument | |
---|---|
NutchDocument |
RelTagIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
The RelTagIndexingFilter filter object. |
Uses of NutchDocument in org.apache.nutch.scoring |
---|
Methods in org.apache.nutch.scoring with parameters of type NutchDocument | |
---|---|
float |
ScoringFilter.indexerScore(String url,
NutchDocument doc,
WebPage page,
float initScore)
This method calculates a Lucene document boost. |
float |
ScoringFilters.indexerScore(String url,
NutchDocument doc,
WebPage row,
float initScore)
|
Uses of NutchDocument in org.apache.nutch.scoring.link |
---|
Methods in org.apache.nutch.scoring.link with parameters of type NutchDocument | |
---|---|
float |
LinkAnalysisScoringFilter.indexerScore(String url,
NutchDocument doc,
WebPage page,
float initScore)
|
Uses of NutchDocument in org.apache.nutch.scoring.opic |
---|
Methods in org.apache.nutch.scoring.opic with parameters of type NutchDocument | |
---|---|
float |
OPICScoringFilter.indexerScore(String url,
NutchDocument doc,
WebPage row,
float initScore)
Dampen the boost value by scorePower. |
Uses of NutchDocument in org.apache.nutch.scoring.tld |
---|
Methods in org.apache.nutch.scoring.tld with parameters of type NutchDocument | |
---|---|
float |
TLDScoringFilter.indexerScore(String url,
NutchDocument doc,
WebPage page,
float initScore)
|
Uses of NutchDocument in org.creativecommons.nutch |
---|
Methods in org.creativecommons.nutch that return NutchDocument | |
---|---|
NutchDocument |
CCIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
Methods in org.creativecommons.nutch with parameters of type NutchDocument | |
---|---|
void |
CCIndexingFilter.addUrlFeatures(NutchDocument doc,
String urlString)
Add the features represented by a license URL. |
NutchDocument |
CCIndexingFilter.filter(NutchDocument doc,
String url,
WebPage page)
|
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |