Uses of Interface org.apache.nutch.parse.Parse (apache-nutch 1.8 API)

Packages that use Parse
Package	Description
org.apache.nutch.analysis.lang	Text document language identifier.
org.apache.nutch.crawl	Crawl control code.
org.apache.nutch.indexer	Maintain Lucene full-text indexes.
org.apache.nutch.indexer.anchor	An indexing plugin for inbound anchor text.
org.apache.nutch.indexer.basic	A basic indexing plugin.
org.apache.nutch.indexer.feed
org.apache.nutch.indexer.metadata
org.apache.nutch.indexer.more	A more indexing plugin.
org.apache.nutch.indexer.staticfield	A simple plugin called at indexing that adds fields with static data.
org.apache.nutch.indexer.subcollection
org.apache.nutch.indexer.tld	Top Level Domain Indexing plugin.
org.apache.nutch.indexer.urlmeta	URL Meta Tag Indexing Plugin
org.apache.nutch.microformats.reltag	A microformats Rel-Tag Parser/Indexer/Querier plugin.
org.apache.nutch.parse
org.apache.nutch.scoring
org.apache.nutch.scoring.link
org.apache.nutch.scoring.opic
org.apache.nutch.scoring.tld	Top Level Domain Scoring plugin.
org.apache.nutch.scoring.urlmeta	URL Meta Tag Scoring Plugin
org.creativecommons.nutch	Sample plugins that parse and index Creative Commons medadata.

Uses of Parse in org.apache.nutch.analysis.lang

Methods in org.apache.nutch.analysis.lang with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	LanguageIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)`

Uses of Parse in org.apache.nutch.crawl

Methods in org.apache.nutch.crawl with parameters of type Parse
Modifier and Type	Method and Description
`abstract byte[]`	Signature.`calculate(Content content, Parse parse)`
`byte[]`	MD5Signature.`calculate(Content content, Parse parse)`
`byte[]`	TextProfileSignature.`calculate(Content content, Parse parse)`

Uses of Parse in org.apache.nutch.indexer

Methods in org.apache.nutch.indexer with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	IndexingFilters.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)` Run all defined filters.
`NutchDocument`	IndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)` Adds fields or otherwise modifies the document that will be indexed for a parse.

Uses of Parse in org.apache.nutch.indexer.anchor

Methods in org.apache.nutch.indexer.anchor with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	AnchorIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)` The `AnchorIndexingFilter` filter object which supports boolean configuration settings for the deduplication of anchors.

Uses of Parse in org.apache.nutch.indexer.basic

Methods in org.apache.nutch.indexer.basic with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	BasicIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)` The `BasicIndexingFilter` filter object which supports few configuration settings for adding basic searchable fields.

Uses of Parse in org.apache.nutch.indexer.feed

Methods in org.apache.nutch.indexer.feed with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	FeedIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)` Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the `Indexer` for indexing within the Nutch index.

Uses of Parse in org.apache.nutch.indexer.metadata

Methods in org.apache.nutch.indexer.metadata with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	MetadataIndexer.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)`

Uses of Parse in org.apache.nutch.indexer.more

Methods in org.apache.nutch.indexer.more with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	MoreIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)`

Uses of Parse in org.apache.nutch.indexer.staticfield

Methods in org.apache.nutch.indexer.staticfield with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	StaticFieldIndexer.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)` The `StaticFieldIndexer` filter object which adds fields as per configuration setting.

Uses of Parse in org.apache.nutch.indexer.subcollection

Methods in org.apache.nutch.indexer.subcollection with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	SubcollectionIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)`

Uses of Parse in org.apache.nutch.indexer.tld

Methods in org.apache.nutch.indexer.tld with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	TLDIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text urlText, CrawlDatum datum, Inlinks inlinks)`

Uses of Parse in org.apache.nutch.indexer.urlmeta

Methods in org.apache.nutch.indexer.urlmeta with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	URLMetaIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)` This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object.

Uses of Parse in org.apache.nutch.microformats.reltag

Methods in org.apache.nutch.microformats.reltag with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	RelTagIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)`

Uses of Parse in org.apache.nutch.parse

Classes in org.apache.nutch.parse that implement Parse
Modifier and Type	Class and Description
`class`	`ParseImpl` The result of parsing a page's raw content.

Methods in org.apache.nutch.parse that return Parse
Modifier and Type	Method and Description
`Parse`	ParseResult.`get(String key)` Retrieve a single parse output.
`Parse`	ParseResult.`get(org.apache.hadoop.io.Text key)` Retrieve a single parse output.
`Parse`	ParseStatus.`getEmptyParse(org.apache.hadoop.conf.Configuration conf)` A convenience method.

Methods in org.apache.nutch.parse that return types with arguments of type Parse
Modifier and Type	Method and Description
`org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,Parse>`	ParseOutputFormat.`getRecordWriter(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress)`
`Iterator<Map.Entry<org.apache.hadoop.io.Text,Parse>>`	ParseResult.`iterator()` Iterate over all entries in the <url, Parse> map.

Methods in org.apache.nutch.parse with parameters of type Parse
Modifier and Type	Method and Description
`static ParseResult`	ParseResult.`createParseResult(String url, Parse parse)` Convenience method for obtaining `ParseResult` from a single `Parse` output.

Constructors in org.apache.nutch.parse with parameters of type Parse
Constructor and Description
`ParseImpl(Parse parse)`

Uses of Parse in org.apache.nutch.scoring

Methods in org.apache.nutch.scoring with parameters of type Parse
Modifier and Type	Method and Description
`float`	ScoringFilter.`indexerScore(org.apache.hadoop.io.Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)` This method calculates a Lucene document boost.
`float`	AbstractScoringFilter.`indexerScore(org.apache.hadoop.io.Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)`
`float`	ScoringFilters.`indexerScore(org.apache.hadoop.io.Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)`
`void`	ScoringFilter.`passScoreAfterParsing(org.apache.hadoop.io.Text url, Content content, Parse parse)` Currently a part of score distribution is performed using only data coming from the parsing process.
`void`	AbstractScoringFilter.`passScoreAfterParsing(org.apache.hadoop.io.Text url, Content content, Parse parse)`
`void`	ScoringFilters.`passScoreAfterParsing(org.apache.hadoop.io.Text url, Content content, Parse parse)`

Uses of Parse in org.apache.nutch.scoring.link

Methods in org.apache.nutch.scoring.link with parameters of type Parse
Modifier and Type	Method and Description
`float`	LinkAnalysisScoringFilter.`indexerScore(org.apache.hadoop.io.Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)`
`void`	LinkAnalysisScoringFilter.`passScoreAfterParsing(org.apache.hadoop.io.Text url, Content content, Parse parse)`

Uses of Parse in org.apache.nutch.scoring.opic

Methods in org.apache.nutch.scoring.opic with parameters of type Parse
Modifier and Type	Method and Description
`float`	OPICScoringFilter.`indexerScore(org.apache.hadoop.io.Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)` Dampen the boost value by scorePower.
`void`	OPICScoringFilter.`passScoreAfterParsing(org.apache.hadoop.io.Text url, Content content, Parse parse)` Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData.

Uses of Parse in org.apache.nutch.scoring.tld

Methods in org.apache.nutch.scoring.tld with parameters of type Parse
Modifier and Type	Method and Description
`float`	TLDScoringFilter.`indexerScore(org.apache.hadoop.io.Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)`
`void`	TLDScoringFilter.`passScoreAfterParsing(org.apache.hadoop.io.Text url, Content content, Parse parse)`

Uses of Parse in org.apache.nutch.scoring.urlmeta

Methods in org.apache.nutch.scoring.urlmeta with parameters of type Parse
Modifier and Type	Method and Description
`float`	URLMetaScoringFilter.`indexerScore(org.apache.hadoop.io.Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)` Boilerplate
`void`	URLMetaScoringFilter.`passScoreAfterParsing(org.apache.hadoop.io.Text url, Content content, Parse parse)` Takes the metadata, which was lumped inside the content, and replicates it within your parse data.

Uses of Parse in org.creativecommons.nutch

Methods in org.creativecommons.nutch with parameters of type Parse
Modifier and Type	Method and Description
`NutchDocument`	CCIndexingFilter.`filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks)`

Uses of Interfaceorg.apache.nutch.parse.Parse

Uses of Parse in org.apache.nutch.analysis.lang

Uses of Parse in org.apache.nutch.crawl

Uses of Parse in org.apache.nutch.indexer

Uses of Parse in org.apache.nutch.indexer.anchor

Uses of Parse in org.apache.nutch.indexer.basic

Uses of Parse in org.apache.nutch.indexer.feed

Uses of Parse in org.apache.nutch.indexer.metadata

Uses of Parse in org.apache.nutch.indexer.more

Uses of Parse in org.apache.nutch.indexer.staticfield

Uses of Parse in org.apache.nutch.indexer.subcollection

Uses of Parse in org.apache.nutch.indexer.tld

Uses of Parse in org.apache.nutch.indexer.urlmeta

Uses of Parse in org.apache.nutch.microformats.reltag

Uses of Parse in org.apache.nutch.parse

Uses of Parse in org.apache.nutch.scoring

Uses of Parse in org.apache.nutch.scoring.link

Uses of Parse in org.apache.nutch.scoring.opic

Uses of Parse in org.apache.nutch.scoring.tld

Uses of Parse in org.apache.nutch.scoring.urlmeta

Uses of Parse in org.creativecommons.nutch

Uses of Interface
org.apache.nutch.parse.Parse