org.apache.nutch.tools
Class DmozParser

java.lang.Object
  extended by org.apache.nutch.tools.DmozParser

public class DmozParser
extends Object

Utility that converts DMOZ RDF into a flat file of URLs to be injected.


Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
DmozParser()
           
 
Method Summary
static void main(String[] argv)
          Command-line access.
 void parseDmozFile(File dmozFile, int subsetDenom, boolean includeAdult, int skew, Pattern topicPattern)
          Iterate through all the items in this structured DMOZ file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

DmozParser

public DmozParser()
Method Detail

parseDmozFile

public void parseDmozFile(File dmozFile,
                          int subsetDenom,
                          boolean includeAdult,
                          int skew,
                          Pattern topicPattern)
                   throws IOException,
                          SAXException,
                          ParserConfigurationException
Iterate through all the items in this structured DMOZ file. Add each URL to the web db.

Throws:
IOException
SAXException
ParserConfigurationException

main

public static void main(String[] argv)
                 throws Exception
Command-line access. User may add URLs via a flat text file or the structured DMOZ file. By default, we ignore Adult material (as categorized by DMOZ).

Throws:
Exception


Copyright © 2011 The Apache Software Foundation