Contents - Introduction - Running the Drug NER pipeline - AggregateTAE.xml for CDA documents conforming to the provided DTD. - AggregatePlaintextProcessor.xml for plaintext documents. - Fixes - Version 1.2.1 ############ Introduction ############ This project adds the ability to identify attributes of drug mentions such as Dosage, Frequency, Frequency Unit, Route and Strength from either plaintext or CDA documents. It also provides the ability to specify which sections of a note contain drugs in a list format versus drug mentions within the narrative of the note. This allows for customized processing done on different sections and generally improves the quality of the annotations. This project utilizes various cTAKES components and hence requires cTAKES to be installed prior to using this component. ############################################################################ Running the ctakes-clinical-pipeline ############################################################################ This project (or PEAR file), the "Drug NER", relies on other projects/PEAR files such as 'ctakes-clinical-pipeline', 'context dependent tokenizer', 'core', 'dictionary lookup', 'LVG' and 'NE contexts'. The pipeline can process two types of documents - plaintext files - Clinical Document Architecture (CDA) XML files that conform to the DTD provided %%%%%%%%%%%%%%%%%%%%%%%%% AggregateTAE.xml for CDA documents conforming to the provided DTD. The file desc/analysis_engine/AggregateTAE.xml is the aggregate analysis engine to use to run the entire pipeline, including the CdaCasInitialzer analysis engine, which reads CDA documents that conform to the DTD provided, and create Segment annotations based on the sections within the CDA document. Open this file using the Component Descriptor Editor as described in the tutorial. Click on the tab labeled "Aggregate" to observe that the Component Engine Flow (pipeline) defined by this descriptor includes CdaCasInitialzer as the first component. Observe that part of speech tagging (POSTagger) comes before chunking (Chunker), etc. Click on the tab labeled "Parameter Settings" to view the parameters set in this descriptor. The 'medicationRelatedSection' is *not* set (generally set to 20104, 20133, 20147 for Mayo Corpus) in the default implementation. If this parameter is left blank, all sections will be treated as narrative sections and if these sections do contain Drugs in list format the accuracy for identifying Drug mentions and its attributes may not be acceptable. It is recommended to specify section ids that contain drugs in a list format if such sections are available. Another parameter that relates to aforementioned 'medicationRelatedSection' is 'sectionOverrideSet'. This parameter specifies the section ids where DrugLookupWindow annotations will span the complete span of text of the specified section. The 'sectionOverrideSet' is *not* set (generally set to 20104, 20133, 20147 for Mayo Corpus) in the default implementation. If this parameter is left blank, all sections will be treated as narrative sections and if these sections do contain Drugs in list format the accuracy for identifying Drug mentions and its attributes may not be acceptable. It is recommended to specify section ids that contain drugs in a list format if such sections are available. If you are not planning to use CDA documents as input, but rather plain text documents, and you prefer the entire document's contents be handled as lists, rather than narrative, then 'SIMPLE_SEGMENT' can be entered into the 'medicationRelatedSection' or 'sectionOverrideSet' (see the tutorial for additional information on adding the 'SIMPLE_SEGMENT' to the Compenent Engine Flow (pipeline)). The parameters are: DrugMentionAnnotator.xml - medicationRelatedSection - IDs of sections generated by your Segment Annotator where drug mentions appear in a list format. DrugCNP2LookupWindow.xml - sectionOverrideSet - IDs of sections (or segments) where the complete section will be treated as DrugLookupWindow which is designed to process medications or drugs in 'list format'. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% AggregatePlaintextProcessor.xml for plaintext documents. The file desc/analysis_engine/AggregatePlaintextProcessor.xml is the aggregate analysis engine to use to run the entire pipeline, including the SimpleSegmentAnnotator analysis engine, which creates a Segment annotation that wraps the entire plaintext document. Other annotators in the pipeline require at least 1 Segment annotation. Click on the tab labeled "Parameter Settings" to view the parameters set in this descriptor. The 'medicationRelatedSection' is set to 20104, 20133, 20147. These are section ids specific to Mayo's CDA documents. These section ids must be changed to match the ids generated by your Segment Annotator. The parameters are: - SegmentID - the identifier or name to assign to the Segment annotation - medicationRelatedSection - IDs of sections generated by your Segment Annotator where drug mentions appear in a list format. ############ Fixes ############ Version 1.2.1 - - Fix problem where drug mentions are not aligned correctly with named entity parent - Fix issue where drug change status increase/decrease/change not correctly creating a noChange mention and/or assigned incorrect attributes.