Uses of Interface
org.apache.nutch.parse.Parse

Packages that use Parse
org.apache.nutch.analysis.lang Text document language identifier. 
org.apache.nutch.crawl Crawl control code. 
org.apache.nutch.indexer Maintain Lucene full-text indexes. 
org.apache.nutch.indexer.anchor An indexing plugin for inbound anchor text. 
org.apache.nutch.indexer.basic A basic indexing plugin. 
org.apache.nutch.indexer.feed   
org.apache.nutch.indexer.metadata   
org.apache.nutch.indexer.more A more indexing plugin. 
org.apache.nutch.indexer.staticfield A simple plugin called at indexing that adds fields with static data. 
org.apache.nutch.indexer.subcollection   
org.apache.nutch.indexer.tld Top Level Domain Indexing plugin. 
org.apache.nutch.indexer.urlmeta URL Meta Tag Indexing Plugin 
org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin. 
org.apache.nutch.parse   
org.apache.nutch.scoring   
org.apache.nutch.scoring.link   
org.apache.nutch.scoring.opic   
org.apache.nutch.scoring.tld Top Level Domain Scoring plugin. 
org.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Plugin 
org.creativecommons.nutch Sample plugins that parse and index Creative Commons medadata. 
 

Uses of Parse in org.apache.nutch.analysis.lang
 

Methods in org.apache.nutch.analysis.lang with parameters of type Parse
 NutchDocument LanguageIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.crawl
 

Methods in org.apache.nutch.crawl with parameters of type Parse
 byte[] MD5Signature.calculate(Content content, Parse parse)
           
 byte[] TextProfileSignature.calculate(Content content, Parse parse)
           
abstract  byte[] Signature.calculate(Content content, Parse parse)
           
 

Uses of Parse in org.apache.nutch.indexer
 

Methods in org.apache.nutch.indexer with parameters of type Parse
 NutchDocument IndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Adds fields or otherwise modifies the document that will be indexed for a parse.
 NutchDocument IndexingFilters.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Run all defined filters.
 

Uses of Parse in org.apache.nutch.indexer.anchor
 

Methods in org.apache.nutch.indexer.anchor with parameters of type Parse
 NutchDocument AnchorIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          The AnchorIndexingFilter filter object which supports boolean configuration settings for the deduplication of anchors.
 

Uses of Parse in org.apache.nutch.indexer.basic
 

Methods in org.apache.nutch.indexer.basic with parameters of type Parse
 NutchDocument BasicIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.indexer.feed
 

Methods in org.apache.nutch.indexer.feed with parameters of type Parse
 NutchDocument FeedIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer for indexing within the Nutch index.
 

Uses of Parse in org.apache.nutch.indexer.metadata
 

Methods in org.apache.nutch.indexer.metadata with parameters of type Parse
 NutchDocument MetadataIndexer.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.indexer.more
 

Methods in org.apache.nutch.indexer.more with parameters of type Parse
 NutchDocument MoreIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.indexer.staticfield
 

Methods in org.apache.nutch.indexer.staticfield with parameters of type Parse
 NutchDocument StaticFieldIndexer.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.indexer.subcollection
 

Methods in org.apache.nutch.indexer.subcollection with parameters of type Parse
 NutchDocument SubcollectionIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.indexer.tld
 

Methods in org.apache.nutch.indexer.tld with parameters of type Parse
 NutchDocument TLDIndexingFilter.filter(NutchDocument doc, Parse parse, Text urlText, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.indexer.urlmeta
 

Methods in org.apache.nutch.indexer.urlmeta with parameters of type Parse
 NutchDocument URLMetaIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object.
 

Uses of Parse in org.apache.nutch.microformats.reltag
 

Methods in org.apache.nutch.microformats.reltag with parameters of type Parse
 NutchDocument RelTagIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 

Uses of Parse in org.apache.nutch.parse
 

Classes in org.apache.nutch.parse that implement Parse
 class ParseImpl
          The result of parsing a page's raw content.
 

Methods in org.apache.nutch.parse that return Parse
 Parse ParseResult.get(String key)
          Retrieve a single parse output.
 Parse ParseResult.get(Text key)
          Retrieve a single parse output.
 Parse ParseStatus.getEmptyParse(Configuration conf)
          A convenience method.
 

Methods in org.apache.nutch.parse that return types with arguments of type Parse
 RecordWriter<Text,Parse> ParseOutputFormat.getRecordWriter(FileSystem fs, JobConf job, String name, Progressable progress)
           
 Iterator<Map.Entry<Text,Parse>> ParseResult.iterator()
          Iterate over all entries in the <url, Parse> map.
 

Methods in org.apache.nutch.parse with parameters of type Parse
static ParseResult ParseResult.createParseResult(String url, Parse parse)
          Convenience method for obtaining ParseResult from a single Parse output.
 

Constructors in org.apache.nutch.parse with parameters of type Parse
ParseImpl(Parse parse)
           
 

Uses of Parse in org.apache.nutch.scoring
 

Methods in org.apache.nutch.scoring with parameters of type Parse
 float ScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          This method calculates a Lucene document boost.
 float ScoringFilters.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
           
 void ScoringFilter.passScoreAfterParsing(Text url, Content content, Parse parse)
          Currently a part of score distribution is performed using only data coming from the parsing process.
 void ScoringFilters.passScoreAfterParsing(Text url, Content content, Parse parse)
           
 

Uses of Parse in org.apache.nutch.scoring.link
 

Methods in org.apache.nutch.scoring.link with parameters of type Parse
 float LinkAnalysisScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
           
 void LinkAnalysisScoringFilter.passScoreAfterParsing(Text url, Content content, Parse parse)
           
 

Uses of Parse in org.apache.nutch.scoring.opic
 

Methods in org.apache.nutch.scoring.opic with parameters of type Parse
 float OPICScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          Dampen the boost value by scorePower.
 void OPICScoringFilter.passScoreAfterParsing(Text url, Content content, Parse parse)
          Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData.
 

Uses of Parse in org.apache.nutch.scoring.tld
 

Methods in org.apache.nutch.scoring.tld with parameters of type Parse
 float TLDScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
           
 void TLDScoringFilter.passScoreAfterParsing(Text url, Content content, Parse parse)
           
 

Uses of Parse in org.apache.nutch.scoring.urlmeta
 

Methods in org.apache.nutch.scoring.urlmeta with parameters of type Parse
 float URLMetaScoringFilter.indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
          Boilerplate
 void URLMetaScoringFilter.passScoreAfterParsing(Text url, Content content, Parse parse)
          Takes the metadata, which was lumped inside the content, and replicates it within your parse data.
 

Uses of Parse in org.creativecommons.nutch
 

Methods in org.creativecommons.nutch with parameters of type Parse
 NutchDocument CCIndexingFilter.filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
           
 



Copyright © 2012 The Apache Software Foundation