org.apache.ctakes.padtermspotter.cr
Class RadiologyRecordsCollectionReader

java.lang.Object
  extended by org.apache.uima.resource.Resource_ImplBase
      extended by org.apache.uima.resource.ConfigurableResource_ImplBase
          extended by org.apache.uima.collection.CollectionReader_ImplBase
              extended by org.apache.ctakes.padtermspotter.cr.RadiologyRecordsCollectionReader
All Implemented Interfaces:
org.apache.uima.collection.base_cpm.BaseCollectionReader, org.apache.uima.collection.CollectionReader, org.apache.uima.resource.ConfigurableResource, org.apache.uima.resource.Resource

public class RadiologyRecordsCollectionReader
extends org.apache.uima.collection.CollectionReader_ImplBase

The original code was copied from org.apache.uima.examples.cpe.FileSystemCollectionReader and modified for Mayo use. This collection reader facilitates reading "documents" from a single file. Each line in the document will be considered an entity to be analyzed by the CPE. That is each line will be treated as a "document" and will have its own CAS. Extremely large files will require large memory resources as each line is read into memory upon initialization. This was done to simplify implementation.

Author:
Mayo Clinic

Field Summary
(package private)  java.lang.Integer filterColunmNumber
           
(package private)  java.lang.String filterExamTypes
           
(package private)  int iv_currentIndex
           
(package private)  java.lang.String iv_delimeter
           
(package private)  java.lang.String iv_language
           
(package private)  java.util.List iv_linesFromFile
           
(package private)  java.lang.Integer numberOfColumns
           
static java.lang.String PARAM_COMMENT_STRING
          Optional parameter specifies a comment string.
static java.lang.String PARAM_EXAM_COLUMN
          The column number of the input file that contains the compare string to parse applicable exam types.
static java.lang.String PARAM_FILTER_EXAMS
          Specifies the file which contains the valid types of exams which will be processed by the pipeline.
static java.lang.String PARAM_ID_DELIMETER
          Name of optional configuration parameter that specifies a character (or string) that delimits the id of the document from the text of the document.
static java.lang.String PARAM_IGNORE_BLANK_LINES
          Optional parameter determines whether a blank line will be processed as a document or will be ignored.
static java.lang.String PARAM_INPUT_FILE_NAME
          This parameter will be used the descriptor file to specify the location of the file that will be run through this collection reader.
static java.lang.String PARAM_LANGUAGE
          Name of optional configuration parameter that contains the language of the documents in the input directory.
static java.lang.String PARAM_TOTAL_COLUMNS
          Number of columns contained in the radiology record.
 
Fields inherited from interface org.apache.uima.resource.Resource
PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_MANAGER, PARAM_CONFIG_PARAM_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT
 
Constructor Summary
RadiologyRecordsCollectionReader()
           
 
Method Summary
 void close()
           
 void getNext(org.apache.uima.cas.CAS cas)
           
 int getNumberOfDocuments()
          Gets the total number of documents that will be returned by this collection reader.
 org.apache.uima.util.Progress[] getProgress()
           
 boolean hasNext()
           
 void initialize()
           
 
Methods inherited from class org.apache.uima.collection.CollectionReader_ImplBase
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
 
Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
 
Methods inherited from class org.apache.uima.resource.Resource_ImplBase
getCasManager, getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.uima.resource.ConfigurableResource
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
 
Methods inherited from interface org.apache.uima.resource.Resource
getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger
 

Field Detail

PARAM_INPUT_FILE_NAME

public static final java.lang.String PARAM_INPUT_FILE_NAME
This parameter will be used the descriptor file to specify the location of the file that will be run through this collection reader.

See Also:
Constant Field Values

PARAM_COMMENT_STRING

public static final java.lang.String PARAM_COMMENT_STRING
Optional parameter specifies a comment string. Any line that begins with the string will be ignored and not be added as a "document" to the CPE.

See Also:
Constant Field Values

PARAM_IGNORE_BLANK_LINES

public static final java.lang.String PARAM_IGNORE_BLANK_LINES
Optional parameter determines whether a blank line will be processed as a document or will be ignored. The default will be set to 'true'.

See Also:
Constant Field Values

PARAM_LANGUAGE

public static final java.lang.String PARAM_LANGUAGE
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified this information will be added to the CAS.

See Also:
Constant Field Values

PARAM_ID_DELIMETER

public static final java.lang.String PARAM_ID_DELIMETER
Name of optional configuration parameter that specifies a character (or string) that delimits the id of the document from the text of the document. For example, if the parameter is set to '|' then the following line from a file: 1234|this is some text would have an id of 1234 and text this is some text. If this parameter is not set, then the id of a document will be its line number in the file.

See Also:
Constant Field Values

PARAM_TOTAL_COLUMNS

public static final java.lang.String PARAM_TOTAL_COLUMNS
Number of columns contained in the radiology record. Typically, all the columns will be skipped for actual annotation except the final column which contains the details of the examination.

See Also:
Constant Field Values

PARAM_FILTER_EXAMS

public static final java.lang.String PARAM_FILTER_EXAMS
Specifies the file which contains the valid types of exams which will be processed by the pipeline.

See Also:
Constant Field Values

PARAM_EXAM_COLUMN

public static final java.lang.String PARAM_EXAM_COLUMN
The column number of the input file that contains the compare string to parse applicable exam types.

See Also:
Constant Field Values

iv_linesFromFile

java.util.List iv_linesFromFile

iv_currentIndex

int iv_currentIndex

iv_language

java.lang.String iv_language

iv_delimeter

java.lang.String iv_delimeter

numberOfColumns

java.lang.Integer numberOfColumns

filterExamTypes

java.lang.String filterExamTypes

filterColunmNumber

java.lang.Integer filterColunmNumber
Constructor Detail

RadiologyRecordsCollectionReader

public RadiologyRecordsCollectionReader()
Method Detail

initialize

public void initialize()
                throws org.apache.uima.resource.ResourceInitializationException
Overrides:
initialize in class org.apache.uima.collection.CollectionReader_ImplBase
Throws:
org.apache.uima.resource.ResourceInitializationException

getNext

public void getNext(org.apache.uima.cas.CAS cas)
             throws java.io.IOException,
                    org.apache.uima.collection.CollectionException
Throws:
java.io.IOException
org.apache.uima.collection.CollectionException

hasNext

public boolean hasNext()
                throws java.io.IOException,
                       org.apache.uima.collection.CollectionException
Throws:
java.io.IOException
org.apache.uima.collection.CollectionException

getProgress

public org.apache.uima.util.Progress[] getProgress()

getNumberOfDocuments

public int getNumberOfDocuments()
Gets the total number of documents that will be returned by this collection reader.

Returns:
the number of documents in the collection

close

public void close()
           throws java.io.IOException
Throws:
java.io.IOException