Class MetaTagsParser

  extended by org.apache.nutch.parse.MetaTagsParser
All Implemented Interfaces:
Configurable, HtmlParseFilter, Pluggable

public class MetaTagsParser
extends Object
implements HtmlParseFilter

Parse HTML meta tags (keywords, description) and store them in the parse metadata so that they can be indexed with the index-metadata plugin with the prefix 'metatag.'

Field Summary
Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter
Constructor Summary
Method Summary
 ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
 Configuration getConf()
 void setConf(Configuration conf)
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public MetaTagsParser()
Method Detail


public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable


public Configuration getConf()
Specified by:
getConf in interface Configurable


public ParseResult filter(Content content,
                          ParseResult parseResult,
                          HTMLMetaTags metaTags,
                          DocumentFragment doc)
Description copied from interface: HtmlParseFilter
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.

Specified by:
filter in interface HtmlParseFilter

Copyright © 2012 The Apache Software Foundation