Contents
- Introduction
- Running the ctakes-clinical-pipeline 
	- AggregateCdaProcessor.xml  for CDA documents conforming to the provided DTD.
	- AggregatePlaintextProcessor.xml  for plaintext documents.


 
############
Introduction
############

This project is the top-level, main project for processing a clinical document
through the entire pipeline, including sentence detection, part of speech tagging, 
chunking, named entity recognition, context detection, and negation detection.


############################################################################
Running the ctakes-clinical-pipeline
############################################################################

The pipeline can process two types of documents
 - plaintext files
 - Clinical Document Architecture (CDA) XML files that conform to the DTD provided 


%%%%%%%%%%%%%%%%%%%%%%%%%
AggregateCdaProcessor.xml  for CDA documents conforming to the provided DTD.

The file cTAKESdesc/analysis_engine/AggregateCdaProcessor.xml is the aggregate
analysis engine to use to run the entire pipeline, including the 
CdaCasInitialzer analysis engine, which reads CDA documents that conform
to the DTD provided, and create Segment annotations based on the sections
within the CDA document.

Open this file using the Component Descriptor Editor as described in the tutorial.
Click on the tab labeled "Aggregate" to observe that the Component Engine Flow (pipeline) 
defined by this descriptor includes CdaCasInitialzer as the first component.
Observe that part of speech tagging (POSTagger) comes before chunking (Chunker), etc.
 
Click on the tab labeled "Parameter Settings" to view the parameters set in this
descriptor.  The ChunkCreatorClass is set to org.apache.ctakes.chunker.ae.PhraseTypeChunkCreator
so that each phrase type gets its own type of annotation, rather than having all
chunks be of type Chunk.  

The parameters are:
- ChunkerCreatorClass - the full class name of an implementation of the 
                        interface org.apache.ctakes.chunker.ae.ChunkerCreator

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
AggregatePlaintextProcessor.xml  for plaintext documents.

The file cTAKESdesc/analysis_engine/AggregatePlaintextProcessor.xml is the aggregate
analysis engine to use to run the entire pipeline, including the 
SimpleSegmentAnnotator analysis engine, which creates a Segment annotation that
wraps the entire plaintext document.  Other annotators in the pipeline require
at least 1 Segment annotation.

Click on the tab labeled "Parameter Settings" to view the parameters set in this
descriptor.  The ChunkCreatorClass is set to org.apache.ctakes.chunker.ae.PhraseTypeChunkCreator
so that each phrase type gets its own type of annotation, rather than having all
chunks be of type Chunk.  

The parameters are:
- SegmentID - the identifier or name to assign to the Segment annotation 
- ChunkerCreatorClass - the full class name of an implementation of the 
                        interface org.apache.ctakes.chunker.ae.ChunkerCreator