org.apache.http.norobots
Class NoRobotClient

Package class diagram package NoRobotClient
java.lang.Object
  extended by org.apache.http.norobots.NoRobotClient

public class NoRobotClient
extends Object

A Client which may be used to decide which urls on a website may be looked at, according to the norobots specification located at: http://www.robotstxt.org/wc/norobots-rfc.html


Constructor Summary
NoRobotClient(String userAgent)
          Create a Client for a particular user-agent name.
 
Method Summary
 boolean isUrlAllowed(URL url)
          Decide if the parsed website will allow this URL to be be seen.
 void parse(URL baseUrl)
          Head to a website and suck in their robots.txt file.
 void parseText(String txt)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NoRobotClient

public NoRobotClient(String userAgent)
Create a Client for a particular user-agent name.

Parameters:
userAgent - name for the robot
Method Detail

parse

public void parse(URL baseUrl)
           throws NoRobotException
Head to a website and suck in their robots.txt file. Note that the URL passed in is for the website and does not include the robots.txt file itself.

Parameters:
baseUrl - of the site
Throws:
NoRobotException

parseText

public void parseText(String txt)
               throws NoRobotException
Throws:
NoRobotException

isUrlAllowed

public boolean isUrlAllowed(URL url)
                     throws IllegalStateException,
                            IllegalArgumentException
Decide if the parsed website will allow this URL to be be seen. Note that parse(URL) must be called before this method is called.

Parameters:
url - in question
Returns:
is the url allowed?
Throws:
IllegalStateException - when parse has not been called
IllegalArgumentException


Copyright © 2008 The Apache Software Foundation