|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Uses of NutchDocument in org.apache.nutch.analysis.lang |
---|
Methods in org.apache.nutch.analysis.lang that return NutchDocument | |
---|---|
NutchDocument |
LanguageIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.apache.nutch.analysis.lang with parameters of type NutchDocument | |
---|---|
NutchDocument |
LanguageIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of NutchDocument in org.apache.nutch.indexer |
---|
Methods in org.apache.nutch.indexer that return NutchDocument | |
---|---|
NutchDocument |
IndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
NutchDocument |
IndexingFilters.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters. |
Methods in org.apache.nutch.indexer with parameters of type NutchDocument | |
---|---|
NutchDocument |
IndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
NutchDocument |
IndexingFilters.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters. |
void |
NutchIndexWriter.write(NutchDocument doc)
|
Uses of NutchDocument in org.apache.nutch.indexer.anchor |
---|
Methods in org.apache.nutch.indexer.anchor that return NutchDocument | |
---|---|
NutchDocument |
AnchorIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
The AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors. |
Methods in org.apache.nutch.indexer.anchor with parameters of type NutchDocument | |
---|---|
NutchDocument |
AnchorIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
The AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors. |
Uses of NutchDocument in org.apache.nutch.indexer.basic |
---|
Methods in org.apache.nutch.indexer.basic that return NutchDocument | |
---|---|
NutchDocument |
BasicIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.apache.nutch.indexer.basic with parameters of type NutchDocument | |
---|---|
NutchDocument |
BasicIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of NutchDocument in org.apache.nutch.indexer.feed |
---|
Methods in org.apache.nutch.indexer.feed that return NutchDocument | |
---|---|
NutchDocument |
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch
index. |
Methods in org.apache.nutch.indexer.feed with parameters of type NutchDocument | |
---|---|
NutchDocument |
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch
index. |
Uses of NutchDocument in org.apache.nutch.indexer.metadata |
---|
Methods in org.apache.nutch.indexer.metadata that return NutchDocument | |
---|---|
NutchDocument |
MetadataIndexer.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.apache.nutch.indexer.metadata with parameters of type NutchDocument | |
---|---|
NutchDocument |
MetadataIndexer.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of NutchDocument in org.apache.nutch.indexer.more |
---|
Methods in org.apache.nutch.indexer.more that return NutchDocument | |
---|---|
NutchDocument |
MoreIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.apache.nutch.indexer.more with parameters of type NutchDocument | |
---|---|
NutchDocument |
MoreIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of NutchDocument in org.apache.nutch.indexer.solr |
---|
Methods in org.apache.nutch.indexer.solr with parameters of type NutchDocument | |
---|---|
void |
SolrWriter.write(NutchDocument doc)
|
Uses of NutchDocument in org.apache.nutch.indexer.staticfield |
---|
Methods in org.apache.nutch.indexer.staticfield that return NutchDocument | |
---|---|
NutchDocument |
StaticFieldIndexer.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.apache.nutch.indexer.staticfield with parameters of type NutchDocument | |
---|---|
NutchDocument |
StaticFieldIndexer.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of NutchDocument in org.apache.nutch.indexer.subcollection |
---|
Methods in org.apache.nutch.indexer.subcollection that return NutchDocument | |
---|---|
NutchDocument |
SubcollectionIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.apache.nutch.indexer.subcollection with parameters of type NutchDocument | |
---|---|
NutchDocument |
SubcollectionIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of NutchDocument in org.apache.nutch.indexer.tld |
---|
Methods in org.apache.nutch.indexer.tld that return NutchDocument | |
---|---|
NutchDocument |
TLDIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text urlText,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.apache.nutch.indexer.tld with parameters of type NutchDocument | |
---|---|
NutchDocument |
TLDIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text urlText,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of NutchDocument in org.apache.nutch.indexer.urlmeta |
---|
Methods in org.apache.nutch.indexer.urlmeta that return NutchDocument | |
---|---|
NutchDocument |
URLMetaIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object. |
Methods in org.apache.nutch.indexer.urlmeta with parameters of type NutchDocument | |
---|---|
NutchDocument |
URLMetaIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object. |
Uses of NutchDocument in org.apache.nutch.microformats.reltag |
---|
Methods in org.apache.nutch.microformats.reltag that return NutchDocument | |
---|---|
NutchDocument |
RelTagIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.apache.nutch.microformats.reltag with parameters of type NutchDocument | |
---|---|
NutchDocument |
RelTagIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of NutchDocument in org.apache.nutch.scoring |
---|
Methods in org.apache.nutch.scoring with parameters of type NutchDocument | |
---|---|
float |
ScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
This method calculates a Lucene document boost. |
float |
ScoringFilters.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
Uses of NutchDocument in org.apache.nutch.scoring.link |
---|
Methods in org.apache.nutch.scoring.link with parameters of type NutchDocument | |
---|---|
float |
LinkAnalysisScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
Uses of NutchDocument in org.apache.nutch.scoring.opic |
---|
Methods in org.apache.nutch.scoring.opic with parameters of type NutchDocument | |
---|---|
float |
OPICScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Dampen the boost value by scorePower. |
Uses of NutchDocument in org.apache.nutch.scoring.tld |
---|
Methods in org.apache.nutch.scoring.tld with parameters of type NutchDocument | |
---|---|
float |
TLDScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
Uses of NutchDocument in org.apache.nutch.scoring.urlmeta |
---|
Methods in org.apache.nutch.scoring.urlmeta with parameters of type NutchDocument | |
---|---|
float |
URLMetaScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Boilerplate |
Uses of NutchDocument in org.creativecommons.nutch |
---|
Methods in org.creativecommons.nutch that return NutchDocument | |
---|---|
NutchDocument |
CCIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Methods in org.creativecommons.nutch with parameters of type NutchDocument | |
---|---|
void |
CCIndexingFilter.addUrlFeatures(NutchDocument doc,
String urlString)
Add the features represented by a license URL. |
NutchDocument |
CCIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |