org.apache.uima.java
org.apache.uima.tika.FileSystemCollectionReader
File System Collection Reader
Reads files from the filesystem. Uses Tika to convert markups into annotation
1.0
Apache Software Foundation
InputDirectory
Directory containing input files
String
false
true
Language
ISO language code for the documents
String
false
false
tikaConfigFile
String
false
false
MIME
MIME type can be forced by the user instead of being detected automatically
String
false
false
MIME
text/html
com.digitalpebble.uima.SourceDocumentAnnotation
com.digitalpebble.uima.MarkupAnnotation
com.digitalpebble.uima.FeatureValue
true
false
true
tikaConfigFile
XML configuration file for Tika
true
tikaConfigFile
XMLconfig file for Tika. If not found the component will rely on the default config
tika_config.xml
tikaConfigFile
tikaConfigFile