PAD term spotter

Overview: The ‘PAD term spotter’ pipeline processes radiology note textual extractions specifically pertaining to the diagnosis, treatment, etc. of lower limb Peripheral Artery Disease (PAD) (e.g. stenosis/occlusion paired with popliteal/femoral). The main feature is classifying each document for the presence of PAD. Descriptive text of diagnosis and illness terms are paired with the site designated terms to build a relational tie, indicating a hit. The pipeline assesses presence of phrases indicative of peripheral arterial disease (PAD) in one or more sentences contained in radiology related documents.

Note

Disclaimer: This should be considered a beta release of this annotator. See the Clinical Text Analysis and Knowledge Extraction System User Guide documentation located in the cTAKES <pipeline-root>/docs/userguide/cTAKES_userguide.htm for detailed install and setup information pertaining to all the cTAKES components. For additional, documentation pertaining to this pipeline see <pipeline-root>/PAD term spotter/doc/NLP OF RADIOLOGY REPORTSv1.pdf and <pipeline-root>/PAD term spotter/README (for prerequisite and installation steps see <pipeline-root>/PAD term spotter/README).

Collection readers (annotator)

RadiologyRecordsCollectionReader.xml

The file desc/collection_reader/RadiologyRecordsCollectionReader.xml provides a descriptor for the Radiology Term Spotter analsys engine. Each line of the record is considered a separate examination and will be classified as such if the record is not being filtered.

Parameters

Input File Name (Required)

the name and path of the file which contains the records which make up the radiology notes being processed

Language (Optional)

will explicitly set a language if used

Comment String (Optional)

will filter/skip lines that begin with this case sensitive literal string

Ignore Blank Lines (Optional)

will prevent blank rows from being processed which may cause interruptions to the subsequent processes

Id Delimiter (Optional)

specifies what character will be used to delimit the identification column (first column) or all fields, depending upon if values are specified for the remaining fields in this panel

Column Count (Optional)

indicates the number of columns, delimited using the value in ‘Id Delimiter’, should be skipped to locate the actual contents of the radiology examination

Filter Exam Types (Optional)

provides a list of valid examination codes to act as a filter to eliminate the need to parse records not related to PAD

Filter Exam Column Number (Optional)

column count of the radiology record, delimited using the value in ‘Id Delimiter’, used as input to compare the ‘Filter Exam Types’ provided above

Analysis engines (annotator)

Radiology_TermSpotterAnnotatorTAE.xml

The file desc/analysis_engine/Radiology_TermSpotterAnnotatorTAE.xml provides a working example of the PAD term spotter pipeline, utilizing the aggregate TAE’s

-SimpleSegmentAnnotator (core project),

-TokenizerAnnotator (core project),

-SentenceDetectorAnnotator (core project),

-SubSectionBoundaryAnnotator,

-ContextDependentTokenizerAnnotator (context dependent tokenizer),

-POS Tagger (POS Tagger project),

-Chunker (Chunker project),

-Radiology_DictionaryLookupCSVAnnotator,

-NegationAnnotator (NE contexts),

-PAD_Hit,

-DxStatusAnnotator,

-NegationDxAnnotator

Red text indicates shipped with this pipeline.

Parameters

ChunkerCreatorClass

the full class name of an implementation of the interface edu.mayo.bmi.uima.chunker.ChunkerCreator. See documentation pertaining to the Chunker analysis engine

SubSectionBoundaryAnnotator.xml

The file desc/analysis_engine/SubSectionBoundaryAnnotator.xml drives the java class org.apache.ctakes.padtermspotter.ae.SubSectionAnnotator which the primary task is to identify sections within the text that indicate paragraphs, subsections, and groups of text that should be handled as one entity. The file resource lookup/radiology/ExamTitleWords.txt provides the site specific terms used to indicate the start of subsections. The subsections will span until either another subsection beginning is discover or the end of the document.

Note

The class core/src/edu/mayo/bmi/fsm/machine/SubSectionPadIdFSM will identified section headers and classify as CONFIRMED_STATUS, NEGATED_STATUS, or PROBABLE_STATUS. However, only NEGATED_STATUS is being implemented within the logic (‘/PAD term spotter/src/main/java/org/apache/ctakes/pad/impl/PADConsumerImpl.java’ will not add hits in sections tagged with this status).

Note

The following classes and files have Mayo specific site and terminology terms that are being leveraged, especially as it pertains to the subsection handling:

1) ‘/PAD term spotter/src/edu/mayo/bmi/fsm/machine/SubSectionPadIdFSM.java’ -Terms; "smh","rmh","gonda","romayo" are indicative of names of buildings on the Mayo campus which are used to mark subsection begin/end - terms; "indications","bleindications","exam","showing" are special terms which often contain the terms being screened for relating to PAD, but since they are titles of examinations, revision sections, and generalized screenings they are to be ignored in the Mayo cohort.

2) ‘/PAD term spotter/src/main/java/org/apache/ctakes/pad/impl/PADConsumerImpl.java’ - Terms; "indications:" and "showing" are special terms which often contain the terms being screened for relating to PAD, but since they are titles of examinations, revision sections, and generalized screenings they are to be ignored in the Mayo cohort. "maxSubsectionSize" is used to limit the overall scope of where the subsection tokens will be searched. It has been hardcoded to 300 in the shipped class.

3) ‘/PAD term spotter/resources/lookup/radiology/ExamTitleWords.txt’ - Comma delimited terms which represent key values to distinguish the type of radiology examination being utilized: US_EXAM (ultrasound), LOWER_EXT (lower extremity), US_LOWER_EXT (ultrasound lower extremity), US_LOWER_SOLO (ultrasound lower extremity one side only), CT_EXAM (CAT scan), CT_EXAM_SOLO (CAT Scan one side only).

4) ‘/PAD term spotter/resources/lookup/radiology/ExamsForPAD.csv’ - Provides a list of valid examination codes to act as a filter to eliminate the need to parse records not related to PAD.

Radiology_DictionaryLookupCSVAnnotator.xml

The file desc/analysis_engine/Radiology_DictionaryLookupCSVAnnotator.xml drives the java class org.apache.ctakes.dictionary.lookup.ae.DictionaryLookupAnnotator which resides in the dictionary lookup project. It is setup to access the /resources/lookup/radiology/LookupDesc_PAD.xml file which specifies what resources will be accessed, which are the PAD Term dictionaries in this case. Specifically, the files \PAD term spotter\resources\lookup\radiology\pad_anatomical_sites.csv\ and \PAD term spotter\resources\lookup\radiology\pad_disorders.csv\ are loaded into memory.
PAD_Hit.xml

The file desc/analysis_engine/PAD_Hit.xml drives the java class org.apache.ctakes.padtermspotter.ae.PADHitAnnotator which operates on the terms discovered in the dictionary annotator. Params: WINDOW_SIZE, ANNOTATION_TYPE, ANNOTATION_PART_ONE_OF_PAIR, ANNOTATION_PART_TWO_OF_PAIR, may be others a) If the annotations of type part_one and part_two fall in the window_size of type annotation_type, it is considered a hit. b) If the annotation is defined as stand alone, then it does not require to be part of a pair to be considered a hit.
DxStatusAnnotator.xml

This is similar to the NE context (see Clinical Text Analysis and Knowledge Extraction System User Guide 4.8. NE contexts). Context terms relating to disease and illness are identified via PAD term spotter/src/edu/mayo/bmi/fsm/pad/machine/DxIndicatorFSM.java.
NegationDxAnnotator.xml

This is similar to the NE context (see Clinical Text Analysis and Knowledge Extraction System User Guide 4.8. NE contexts). Negation terms relating to disease and illness are identified via PAD term spotter/src/edu/mayo/bmi/fsm/pad/machine/NegDxIndicatorFSM.java.

CAS consumers

PADOffSetsRecord.xml

The CAS consumer provided in /desc/cas_consumper/PADOffSetsRecord.xml is provided as a means to post process the results and provides the following features

1 - Record by record level classification for PAD 2 - Site and disorder terms along with offset information (useful for debugging) 3 - Overall patient level classification based on record classification

Parameters

Output File Name (required)

Specifies the location of the detail and summary report.

UsingAlternateAlgorithm(false by default)

Boolean value which indicates if a alternate algorithm should be used for the post processing (PAD>NEG>POS>UNK)

Resources

ExamTitleWords.txt
The PAD term spotter config file resources/lookup/radiology/ExamTitleWords.txt consist of comma delimited terms which represent key values to distinguish the type of radiology examination being utilized
US_EXAM (ultrasound) LOWER_EXT (lower extremity) US_LOWER_EXT (ultrasound lower extremity) US_LOWER_SOLO (ultrasound lower extremity one side only) CT_EXAM (CAT scan) CT_EXAM_SOLO (CAT Scan one side only)

Example;; The following example will begin searching a phrase for the token word "Pelvis", starting with the 5 token and ending with the 30 token, and add the key phrase "US_LOWER_SOLO" and offset values to an array list : Pelvis<1>,US_LOWER_SOLO<2>,5<3>,30<4>
The first field is the (Mayo specific) examination token string being filtered within the document text

The key phrase mapped to the radiology terms listed above

The offset of within a set of tokens being parsed to begin searching for the examination token string

The offset of within a set of tokens being parsed to end searching for the examination token string
ExamsForPAD.csv

The file resources/lookup/radiology/ExamsForPAD.csv Provides a list of valid examination codes to act as a filter to eliminate the need to parse records not related to PAD. This improves performance and minimizes the probability of false positive mentions.