Title: Apache cTAKES components Notice: Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at . http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. # Apache cTAKES Components ### Sentence boundary detection Apache OpenNLP technology with a model trained on manually annotated clinical data (see Savova et al, 2010) ### Tokenization Rule-based (see Savova et al, 2010) ### Morphologic normalization (National Library of Medicine's Lexical Variant Generation tool) [http://www.nlm.nih.gov/research/umls/new_users/online_learning/LEX_004.htm](http://www.nlm.nih.gov/research/umls/new_users/online_learning/LEX_004.htm) ### POS tagging Apache OpenNLP technology with a model trained on manually annotated clinical data (see Savova et al, 2010; upcoming 2013 publication) ### Shallow parsing Apache OpenNLP technology with a model trained on manually annotated clinical data (see Savova et al, 2010) ### Named Entity Recognition (see Savova et al, 2010) - Dictionary mapping (lookup algorithm) - Semantic typing is based on these UMLS semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications ### Assertion module Discovers negation, degree of certainty and the subject/experiencer of the clinical event (upcoming 2013 publication) ### Dependency parser Detects dependency relations between words (machine learning with a model trained on manually annotated clinical data) (see Choi and Palmer, 2011a; Choi and Palmer, 2011b; upcoming 2013 publication) ### Constituency parser Apache OpenNLP technology with a model trained on manually annotated clinical data (see Zheng et al, 2011) ### Semantic Role Labeler Assigns the predicate-argument structure of the sentence (who did what to whom when and where) (see Choi and Palmer, 2011a; Choi and Palmer, 2011b; upcoming 2013 publication) ### Co-reference resolver Resolves co-referring entities. (machine learning with a model trained on manually annotated clinical data) (see Zheng et al, 2011) ### Relation extractor Discovers attributes such as the location and the severity of a clinical condition (machine learning with a model trained on manually annotated clinical data) (upcoming 2013 publication) ### Drug Profile module Discovers drug-specific attributes such as dosage, duration, form, frequency, route, strength (see Sohn et al, 2010; Savova et al, 2011) ### Smoking status classifier Classifies document/patient as past smoker, current smoker, non-smoker, smoker, unknown (see Savova et al, 2008) # Select Publications: Choi J, Palmer M. Getting the most out of Transition-based Dependency Parsing. 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies,. Portland, OR.: ACL-HLT 2011a , 2011. Choi J, Palmer M. Transition-based Semantic Role Labeling Using Predicate Argument Clustering. RELMS 2011: Relational Models of Semantics, held in conjunction with ACL-HLT 2011. Portland, OR, 2011b. Savova G, Olson J, Murphy S, Cafourek V, Couch F, Goetz M, Ingle J, Suman V, Chute C and Weinshilboum R. 2011. The electronic medical record and drug response research: automated discovery of drug treatment patterns for endocrine therapy of breast cancer. Journal of American Medical Informatics Association. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, and Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association : JAMIA 2010;17(5):507-13. Savova G, Ogren P, Duffy P, Buntrock J and Chute C. 2008. Mayo Clinic System for patient smoking status classification. J Am Med Inform Assoc. 2008; 15(1):25-8. PMID: 17947622 Sohn S, Murphy SP, Masanz JJ, Kocher JP, Savova GK. Classification of medication status change in clinical narratives. AMIA Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 2010;2010:762-6. Zheng J, Chapman W, Miller T, Lin C, Crowley R and Savova G. 2012. A system for coreference resolution for the clinical narrative. Journal of the American Medical Informatics Association. doi:10.1136/amiajnl-2011-000599