Class IdentityHtmlMapper

java.lang.Object
org.apache.tika.parser.html.IdentityHtmlMapper
All Implemented Interfaces:
HtmlMapper

public class IdentityHtmlMapper extends Object implements HtmlMapper
Alternative HTML mapping rules that pass the input HTML as-is without any modifications.
Since:
Apache Tika 0.8
  • Field Details

    • INSTANCE

      public static final HtmlMapper INSTANCE
  • Constructor Details

    • IdentityHtmlMapper

      public IdentityHtmlMapper()
  • Method Details

    • isDiscardElement

      public boolean isDiscardElement(String name)
      Description copied from interface: HtmlMapper
      Checks whether all content within the given HTML element should be discarded instead of including it in the parse output.
      Specified by:
      isDiscardElement in interface HtmlMapper
      Parameters:
      name - HTML element name (upper case)
      Returns:
      true if content inside the named element should be ignored, false otherwise
    • mapSafeAttribute

      public String mapSafeAttribute(String elementName, String attributeName)
      Description copied from interface: HtmlMapper
      Maps "safe" HTML attribute names to semantic XHTML equivalents. If the given attribute is unknown or deemed unsafe for inclusion in the parse output, then this method returns null and the attribute will be ignored. This method assumes that the element name is valid and normalised.
      Specified by:
      mapSafeAttribute in interface HtmlMapper
      Parameters:
      elementName - HTML element name (lower case)
      attributeName - HTML attribute name (lower case)
      Returns:
      XHTML attribute name (lower case), or null if the element is unsafe
    • mapSafeElement

      public String mapSafeElement(String name)
      Description copied from interface: HtmlMapper
      Maps "safe" HTML element names to semantic XHTML equivalents. If the given element is unknown or deemed unsafe for inclusion in the parse output, then this method returns null and the element will be ignored but the content inside it is still processed. See the HtmlMapper.isDiscardElement(String) method for a way to discard the entire contents of an element.
      Specified by:
      mapSafeElement in interface HtmlMapper
      Parameters:
      name - HTML element name (upper case)
      Returns:
      XHTML element name (lower case), or null if the element is unsafe