Apache JMeter

org.apache.jmeter.protocol.http.parser
Class HTMLParser

java.lang.Object
  extended by org.apache.jmeter.protocol.http.parser.HTMLParser
Direct Known Subclasses:
JsoupBasedHtmlParser, LagartoBasedHtmlParser

public abstract class HTMLParser
extends Object

HtmlParsers can parse HTML content to obtain URLs.


Field Summary
protected static String ATT_BACKGROUND
           
protected static String ATT_CODE
           
protected static String ATT_CODEBASE
           
protected static String ATT_DATA
           
protected static String ATT_HREF
           
protected static String ATT_IS_IMAGE
           
protected static String ATT_REL
           
protected static String ATT_SRC
           
protected static String ATT_STYLE
           
protected static String ATT_TYPE
           
static String DEFAULT_PARSER
           
static String PARSER_CLASSNAME
           
protected static String STYLESHEET
           
protected static String TAG_APPLET
           
protected static String TAG_BASE
           
protected static String TAG_BGSOUND
           
protected static String TAG_BODY
           
protected static String TAG_EMBED
           
protected static String TAG_FRAME
           
protected static String TAG_IFRAME
           
protected static String TAG_IMAGE
           
protected static String TAG_INPUT
           
protected static String TAG_LINK
           
protected static String TAG_OBJECT
           
protected static String TAG_SCRIPT
           
 
Constructor Summary
protected HTMLParser()
          Protected constructor to prevent instantiation except from within subclasses.
 
Method Summary
 Iterator<URL> getEmbeddedResourceURLs(byte[] html, URL baseUrl, Collection<URLString> coll, String encoding)
          Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
 Iterator<URL> getEmbeddedResourceURLs(byte[] html, URL baseUrl, String encoding)
          Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
abstract  Iterator<URL> getEmbeddedResourceURLs(byte[] html, URL baseUrl, URLCollection coll, String encoding)
          Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
static HTMLParser getParser()
           
static HTMLParser getParser(String htmlParserClassName)
           
protected  boolean isReusable()
          Parsers should over-ride this method if the parser class is re-usable, in which case the class will be cached for the next getParser() call.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ATT_BACKGROUND

protected static final String ATT_BACKGROUND
See Also:
Constant Field Values

ATT_CODE

protected static final String ATT_CODE
See Also:
Constant Field Values

ATT_CODEBASE

protected static final String ATT_CODEBASE
See Also:
Constant Field Values

ATT_DATA

protected static final String ATT_DATA
See Also:
Constant Field Values

ATT_HREF

protected static final String ATT_HREF
See Also:
Constant Field Values

ATT_REL

protected static final String ATT_REL
See Also:
Constant Field Values

ATT_SRC

protected static final String ATT_SRC
See Also:
Constant Field Values

ATT_STYLE

protected static final String ATT_STYLE
See Also:
Constant Field Values

ATT_TYPE

protected static final String ATT_TYPE
See Also:
Constant Field Values

ATT_IS_IMAGE

protected static final String ATT_IS_IMAGE
See Also:
Constant Field Values

TAG_APPLET

protected static final String TAG_APPLET
See Also:
Constant Field Values

TAG_BASE

protected static final String TAG_BASE
See Also:
Constant Field Values

TAG_BGSOUND

protected static final String TAG_BGSOUND
See Also:
Constant Field Values

TAG_BODY

protected static final String TAG_BODY
See Also:
Constant Field Values

TAG_EMBED

protected static final String TAG_EMBED
See Also:
Constant Field Values

TAG_FRAME

protected static final String TAG_FRAME
See Also:
Constant Field Values

TAG_IFRAME

protected static final String TAG_IFRAME
See Also:
Constant Field Values

TAG_IMAGE

protected static final String TAG_IMAGE
See Also:
Constant Field Values

TAG_INPUT

protected static final String TAG_INPUT
See Also:
Constant Field Values

TAG_LINK

protected static final String TAG_LINK
See Also:
Constant Field Values

TAG_OBJECT

protected static final String TAG_OBJECT
See Also:
Constant Field Values

TAG_SCRIPT

protected static final String TAG_SCRIPT
See Also:
Constant Field Values

STYLESHEET

protected static final String STYLESHEET
See Also:
Constant Field Values

PARSER_CLASSNAME

public static final String PARSER_CLASSNAME
See Also:
Constant Field Values

DEFAULT_PARSER

public static final String DEFAULT_PARSER
See Also:
Constant Field Values
Constructor Detail

HTMLParser

protected HTMLParser()
Protected constructor to prevent instantiation except from within subclasses.

Method Detail

getParser

public static final HTMLParser getParser()

getParser

public static final HTMLParser getParser(String htmlParserClassName)

getEmbeddedResourceURLs

public Iterator<URL> getEmbeddedResourceURLs(byte[] html,
                                             URL baseUrl,
                                             String encoding)
                                      throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

URLs should not appear twice in the returned iterator.

Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.

Parameters:
html - HTML code
baseUrl - Base URL from which the HTML code was obtained
encoding - Charset
Returns:
an Iterator for the resource URLs
Throws:
HTMLParseException

getEmbeddedResourceURLs

public abstract Iterator<URL> getEmbeddedResourceURLs(byte[] html,
                                                      URL baseUrl,
                                                      URLCollection coll,
                                                      String encoding)
                                               throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

All URLs should be added to the Collection.

Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException. N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.

Parameters:
html - HTML code
baseUrl - Base URL from which the HTML code was obtained
coll - URLCollection
encoding - Charset
Returns:
an Iterator for the resource URLs
Throws:
HTMLParseException

getEmbeddedResourceURLs

public Iterator<URL> getEmbeddedResourceURLs(byte[] html,
                                             URL baseUrl,
                                             Collection<URLString> coll,
                                             String encoding)
                                      throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc... N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.

Parameters:
html - HTML code
baseUrl - Base URL from which the HTML code was obtained
coll - Collection - will contain URLString objects, not URLs
encoding - Charset
Returns:
an Iterator for the resource URLs
Throws:
HTMLParseException

isReusable

protected boolean isReusable()
Parsers should over-ride this method if the parser class is re-usable, in which case the class will be cached for the next getParser() call.

Returns:
true if the Parser is reusable

Apache JMeter

Copyright © 1998-2013 Apache Software Foundation. All Rights Reserved.