org.apache.nutch.tools
Class DmozParser
java.lang.Object
org.apache.nutch.tools.DmozParser
public class DmozParser
- extends Object
Utility that converts DMOZ RDF into a flat file of URLs to be injected.
Field Summary |
static org.apache.commons.logging.Log |
LOG
|
Method Summary |
static void |
main(String[] argv)
Command-line access. |
void |
parseDmozFile(File dmozFile,
int subsetDenom,
boolean includeAdult,
int skew,
Pattern topicPattern)
Iterate through all the items in this structured DMOZ file. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.apache.commons.logging.Log LOG
DmozParser
public DmozParser()
parseDmozFile
public void parseDmozFile(File dmozFile,
int subsetDenom,
boolean includeAdult,
int skew,
Pattern topicPattern)
throws IOException,
SAXException,
ParserConfigurationException
- Iterate through all the items in this structured DMOZ file.
Add each URL to the web db.
- Throws:
IOException
SAXException
ParserConfigurationException
main
public static void main(String[] argv)
throws Exception
- Command-line access. User may add URLs via a flat text file
or the structured DMOZ file. By default, we ignore Adult
material (as categorized by DMOZ).
- Throws:
Exception
Copyright © 2011 The Apache Software Foundation