|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Parse | |
---|---|
org.apache.nutch.analysis.lang | Text document language identifier. |
org.apache.nutch.crawl | Crawl control code. |
org.apache.nutch.indexer | Maintain Lucene full-text indexes. |
org.apache.nutch.indexer.basic | A basic indexing plugin. |
org.apache.nutch.indexer.more | A more indexing plugin. |
org.apache.nutch.microformats.reltag | A microformats Rel-Tag Parser/Indexer/Querier plugin. |
org.apache.nutch.parse | |
org.apache.nutch.scoring | |
org.apache.nutch.scoring.opic | |
org.creativecommons.nutch | Sample plugins that parse and index Creative Commons medadata. |
Uses of Parse in org.apache.nutch.analysis.lang |
---|
Methods in org.apache.nutch.analysis.lang with parameters of type Parse | |
---|---|
NutchDocument |
LanguageIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.crawl |
---|
Methods in org.apache.nutch.crawl with parameters of type Parse | |
---|---|
byte[] |
MD5Signature.calculate(Content content,
Parse parse)
|
byte[] |
TextProfileSignature.calculate(Content content,
Parse parse)
|
abstract byte[] |
Signature.calculate(Content content,
Parse parse)
|
Uses of Parse in org.apache.nutch.indexer |
---|
Methods in org.apache.nutch.indexer with parameters of type Parse | |
---|---|
NutchDocument |
IndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
NutchDocument |
IndexingFilters.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters. |
Uses of Parse in org.apache.nutch.indexer.basic |
---|
Methods in org.apache.nutch.indexer.basic with parameters of type Parse | |
---|---|
NutchDocument |
BasicIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.indexer.more |
---|
Methods in org.apache.nutch.indexer.more with parameters of type Parse | |
---|---|
NutchDocument |
MoreIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.microformats.reltag |
---|
Methods in org.apache.nutch.microformats.reltag with parameters of type Parse | |
---|---|
NutchDocument |
RelTagIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of Parse in org.apache.nutch.parse |
---|
Classes in org.apache.nutch.parse that implement Parse | |
---|---|
class |
ParseImpl
The result of parsing a page's raw content. |
Methods in org.apache.nutch.parse that return Parse | |
---|---|
Parse |
ParseResult.get(String key)
Retrieve a single parse output. |
Parse |
ParseResult.get(Text key)
Retrieve a single parse output. |
Parse |
ParseStatus.getEmptyParse(Configuration conf)
A convenience method. |
Methods in org.apache.nutch.parse that return types with arguments of type Parse | |
---|---|
RecordWriter<Text,Parse> |
ParseOutputFormat.getRecordWriter(FileSystem fs,
JobConf job,
String name,
Progressable progress)
|
Iterator<Map.Entry<Text,Parse>> |
ParseResult.iterator()
Iterate over all entries in the <url, Parse> map. |
Methods in org.apache.nutch.parse with parameters of type Parse | |
---|---|
static ParseResult |
ParseResult.createParseResult(String url,
Parse parse)
Convenience method for obtaining ParseResult from a single
Parse output. |
Constructors in org.apache.nutch.parse with parameters of type Parse | |
---|---|
ParseImpl(Parse parse)
|
Uses of Parse in org.apache.nutch.scoring |
---|
Methods in org.apache.nutch.scoring with parameters of type Parse | |
---|---|
float |
ScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
This method calculates a Lucene document boost. |
float |
ScoringFilters.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
void |
ScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Currently a part of score distribution is performed using only data coming from the parsing process. |
void |
ScoringFilters.passScoreAfterParsing(Text url,
Content content,
Parse parse)
|
Uses of Parse in org.apache.nutch.scoring.opic |
---|
Methods in org.apache.nutch.scoring.opic with parameters of type Parse | |
---|---|
float |
OPICScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Dampen the boost value by scorePower. |
void |
OPICScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData. |
Uses of Parse in org.creativecommons.nutch |
---|
Methods in org.creativecommons.nutch with parameters of type Parse | |
---|---|
NutchDocument |
CCIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |