public class EncodingDetector extends Object
Broadly this encompasses two functions, which are distinctly separate:
A caller will often have some extra information about what the encoding might be (e.g. from the HTTP header or HTML meta-tags, often wrong but still potentially useful clues). The types of clues may differ from caller to caller. Thus a typical calling sequence is:
Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG |
static String |
MIN_CONFIDENCE_KEY |
static int |
NO_THRESHOLD |
Constructor and Description |
---|
EncodingDetector(org.apache.hadoop.conf.Configuration conf) |
Modifier and Type | Method and Description |
---|---|
void |
addClue(String value,
String source) |
void |
addClue(String value,
String source,
int confidence) |
void |
autoDetectClues(Content content,
boolean filter) |
void |
clearClues()
Clears all clues.
|
String |
guessEncoding(Content content,
String defaultValue)
Guess the encoding with the previously specified list of clues.
|
static void |
main(String[] args) |
static String |
parseCharacterEncoding(String contentType)
Parse the character encoding from the specified content type header.
|
static String |
resolveEncodingAlias(String encoding) |
public static final org.slf4j.Logger LOG
public static final int NO_THRESHOLD
public static final String MIN_CONFIDENCE_KEY
public EncodingDetector(org.apache.hadoop.conf.Configuration conf)
public void autoDetectClues(Content content, boolean filter)
public String guessEncoding(Content content, String defaultValue)
content
- Content instancedefaultValue
- Default encoding to return if no encoding can be
detected with enough confidence. Note that this will not be
normalized with resolveEncodingAlias(java.lang.String)
public void clearClues()
public static String parseCharacterEncoding(String contentType)
null
is returned.
contentType
- a content type headerpublic static void main(String[] args) throws IOException
IOException
Copyright © 2014 The Apache Software Foundation