|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Uses of Parse in org.apache.nutch.analysis.lang |
---|
Methods in org.apache.nutch.analysis.lang with parameters of type Parse | |
---|---|
NutchDocument |
LanguageIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.crawl |
---|
Methods in org.apache.nutch.crawl with parameters of type Parse | |
---|---|
byte[] |
MD5Signature.calculate(Content content,
Parse parse)
|
byte[] |
TextProfileSignature.calculate(Content content,
Parse parse)
|
abstract byte[] |
Signature.calculate(Content content,
Parse parse)
|
Uses of Parse in org.apache.nutch.indexer |
---|
Methods in org.apache.nutch.indexer with parameters of type Parse | |
---|---|
NutchDocument |
IndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
NutchDocument |
IndexingFilters.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters. |
Uses of Parse in org.apache.nutch.indexer.anchor |
---|
Methods in org.apache.nutch.indexer.anchor with parameters of type Parse | |
---|---|
NutchDocument |
AnchorIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
The AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors. |
Uses of Parse in org.apache.nutch.indexer.basic |
---|
Methods in org.apache.nutch.indexer.basic with parameters of type Parse | |
---|---|
NutchDocument |
BasicIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.indexer.feed |
---|
Methods in org.apache.nutch.indexer.feed with parameters of type Parse | |
---|---|
NutchDocument |
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch
index. |
Uses of Parse in org.apache.nutch.indexer.metadata |
---|
Methods in org.apache.nutch.indexer.metadata with parameters of type Parse | |
---|---|
NutchDocument |
MetadataIndexer.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.indexer.more |
---|
Methods in org.apache.nutch.indexer.more with parameters of type Parse | |
---|---|
NutchDocument |
MoreIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.indexer.staticfield |
---|
Methods in org.apache.nutch.indexer.staticfield with parameters of type Parse | |
---|---|
NutchDocument |
StaticFieldIndexer.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.indexer.subcollection |
---|
Methods in org.apache.nutch.indexer.subcollection with parameters of type Parse | |
---|---|
NutchDocument |
SubcollectionIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.indexer.tld |
---|
Methods in org.apache.nutch.indexer.tld with parameters of type Parse | |
---|---|
NutchDocument |
TLDIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text urlText,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.indexer.urlmeta |
---|
Methods in org.apache.nutch.indexer.urlmeta with parameters of type Parse | |
---|---|
NutchDocument |
URLMetaIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object. |
Uses of Parse in org.apache.nutch.microformats.reltag |
---|
Methods in org.apache.nutch.microformats.reltag with parameters of type Parse | |
---|---|
NutchDocument |
RelTagIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.parse |
---|
Classes in org.apache.nutch.parse that implement Parse | |
---|---|
class |
ParseImpl
The result of parsing a page's raw content. |
Methods in org.apache.nutch.parse that return Parse | |
---|---|
Parse |
ParseResult.get(String key)
Retrieve a single parse output. |
Parse |
ParseResult.get(Text key)
Retrieve a single parse output. |
Parse |
ParseStatus.getEmptyParse(Configuration conf)
A convenience method. |
Methods in org.apache.nutch.parse that return types with arguments of type Parse | |
---|---|
RecordWriter<Text,Parse> |
ParseOutputFormat.getRecordWriter(FileSystem fs,
JobConf job,
String name,
Progressable progress)
|
Iterator<Map.Entry<Text,Parse>> |
ParseResult.iterator()
Iterate over all entries in the <url, Parse> map. |
Methods in org.apache.nutch.parse with parameters of type Parse | |
---|---|
static ParseResult |
ParseResult.createParseResult(String url,
Parse parse)
Convenience method for obtaining ParseResult from a single
Parse output. |
Constructors in org.apache.nutch.parse with parameters of type Parse | |
---|---|
ParseImpl(Parse parse)
|
Uses of Parse in org.apache.nutch.scoring |
---|
Methods in org.apache.nutch.scoring with parameters of type Parse | |
---|---|
float |
ScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
This method calculates a Lucene document boost. |
float |
ScoringFilters.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
void |
ScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Currently a part of score distribution is performed using only data coming from the parsing process. |
void |
ScoringFilters.passScoreAfterParsing(Text url,
Content content,
Parse parse)
|
Uses of Parse in org.apache.nutch.scoring.link |
---|
Methods in org.apache.nutch.scoring.link with parameters of type Parse | |
---|---|
float |
LinkAnalysisScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
void |
LinkAnalysisScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
|
Uses of Parse in org.apache.nutch.scoring.opic |
---|
Methods in org.apache.nutch.scoring.opic with parameters of type Parse | |
---|---|
float |
OPICScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Dampen the boost value by scorePower. |
void |
OPICScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData. |
Uses of Parse in org.apache.nutch.scoring.tld |
---|
Methods in org.apache.nutch.scoring.tld with parameters of type Parse | |
---|---|
float |
TLDScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
void |
TLDScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
|
Uses of Parse in org.apache.nutch.scoring.urlmeta |
---|
Methods in org.apache.nutch.scoring.urlmeta with parameters of type Parse | |
---|---|
float |
URLMetaScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Boilerplate |
void |
URLMetaScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Takes the metadata, which was lumped inside the content, and replicates it within your parse data. |
Uses of Parse in org.creativecommons.nutch |
---|
Methods in org.creativecommons.nutch with parameters of type Parse | |
---|---|
NutchDocument |
CCIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |