public interface ExtractingParams
Modifier and Type | Field and Description |
---|---|
static String |
CAPTURE_ATTRIBUTES
Capture attributes separately according to the name of the element, instead of just adding them to the string buffer
|
static String |
CAPTURE_ELEMENTS
Capture the specified fields (and everything included below it that isn't capture by some other capture field) separately from the default.
|
static String |
DEFAULT_FIELD
Optional.
|
static String |
EXTRACT_FORMAT
Content output format if extractOnly is true.
|
static String |
EXTRACT_ONLY
Only extract and return the content, do not index it.
|
static String |
IGNORE_TIKA_EXCEPTION
if true, ignore TikaException (give up to extract text but index meta data)
|
static String |
LITERALS_OVERRIDE
Literal field values will by default override other values such as metadata and content.
|
static String |
LITERALS_PREFIX
Pass in literal values to be added to the document, as in
|
static String |
LOWERNAMES
Map all generated attribute names to field names with lowercase and underscores.
|
static String |
MAP_PREFIX
The param prefix for mapping Tika metadata to Solr fields.
|
static String |
PASSWORD_MAP_FILE
Optional.
|
static String |
RESOURCE_NAME
Optional.
|
static String |
RESOURCE_PASSWORD
Optional.
|
static String |
STREAM_TYPE
The type of the stream.
|
static String |
UNKNOWN_FIELD_PREFIX
Optional.
|
static String |
XPATH_EXPRESSION
Restrict the extracted parts of a document to be indexed
by passing in an XPath expression.
|
static final String LOWERNAMES
static final String IGNORE_TIKA_EXCEPTION
static final String MAP_PREFIX
To map a field, add a name like:
fmap.title=solr.titleIn this example, the tika "title" metadata value will be added to a Solr field named "solr.title"
static final String LITERALS_PREFIX
literal.myField=Foo
static final String XPATH_EXPRESSION
SolrContentHandler
.
See Tika's docs for what the extracted document looks like.
CAPTURE_ELEMENTS
,
Constant Field Valuesstatic final String EXTRACT_ONLY
static final String EXTRACT_FORMAT
static final String CAPTURE_ATTRIBUTES
static final String LITERALS_OVERRIDE
static final String CAPTURE_ELEMENTS
The Capture field is based on the localName returned to the SolrContentHandler
by Tika, not to be confused by the mapped field. The field name can then
be mapped into the index schema.
For instance, a Tika document may look like:
<html> ... <body> <p>some text here. <div>more text</div></p> Some more text </body>By passing in the p tag, you could capture all P tags separately from the rest of the t Thus, in the example, the capture of the P tag would be: "some text here. more text"
static final String STREAM_TYPE
static final String RESOURCE_NAME
static final String RESOURCE_PASSWORD
static final String UNKNOWN_FIELD_PREFIX
static final String DEFAULT_FIELD
static final String PASSWORD_MAP_FILE
File format is Java properties format with one key=value per line. The key is evaluated as a regex against the file name, and the value is the password The rules are evaluated top-bottom, i.e. the first match will be used If you want a fallback password to be always used, supply a .*=<defaultmypassword> at the end
Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.