org.creativecommons.nutch
Class CCParseFilter
java.lang.Object
org.creativecommons.nutch.CCParseFilter
- All Implemented Interfaces:
- Configurable, HtmlParseFilter, Pluggable
public class CCParseFilter
- extends Object
- implements HtmlParseFilter
Adds metadata identifying the Creative Commons license used, if any.
Nested Class Summary |
static class |
CCParseFilter.Walker
Walks DOM tree, looking for RDF in comments and licenses in anchors. |
Field Summary |
static org.slf4j.Logger |
LOG
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.slf4j.Logger LOG
CCParseFilter
public CCParseFilter()
filter
public ParseResult filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
- Adds metadata or otherwise modifies a parse of an HTML document, given
the DOM tree of a page.
- Specified by:
filter
in interface HtmlParseFilter
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
Copyright © 2011 The Apache Software Foundation