Contents
- Introduction
- Description of resources
    - lvg.properties
    - LVG database
- Running the LVG annotator
	- LvgAnnotator.xml
	- AggregateAE.xml

############
Introduction
############

This annotator wraps the National Library of Medicine (NLM) SPECIALIST lexical tools.

See the cTAKES Wiki for the latest information about this annotator:
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES

Documentation for the SPECIALIST lexical tools is at:
https://lsg3.nlm.nih.gov/Specialist/Home/index.html

Documentation for Lvg and Norm can be found at: 
https://lsg3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/web/index.html
  
This annotator generates a canonical form for each word and also generates a list of lemma 
entries with Penn Treebank tags.  These tags could be useful for a part of speech (POS) tagger.

However, for the OpenNLP POS tagger, cTAKES uses a tag dictionary rather than lemma information.
See the documentation for the POS tagger annotator. 


########################
Description of resources
########################

%%%%%%%%%%%%%%%%
lvg.properties
%%%%%%%%%%%%%%%%
The LVG configuration file lvg.properties defines the location
and attributes of the LVG database and the jdbc driver used.

%%%%%%%%%%%%%%%%
LVG database
%%%%%%%%%%%%%%%%

The database engine used is hsqldb. 

The LVG database available from the NLM is hundreds of megabytes.  To keep this 
project relatively small, the database tables included with this project have a 
relatively small number of rows.

#########################
Running the LVG annotator
#########################

%%%%%%%%%%%%%%%%
LvgAnnotator.xml
%%%%%%%%%%%%%%%%

The parameters are:
  UseSegments - controls whether only certain sections will be annotated by this annotator
  SegmentsToSkip - list of sections not to be processed by this annotator 
  UseCmdCache - controls whether to look up information in a cache before using norm
  CmdCacheFileLocation - location of norm cache file
  CmdCacheFrequencyCutoff - 
  ExclusionSet - words for which canonicalForm is never set and Lemma entries are never posted
  XeroxTreebankMap - mapping of part of speech tags, used to POS tags from lexical tools to Penn Treebank tags
  PostLemmas - controls whether any lemma entries are posted to the CAS  
  UseLemmaCache - controls whether to look up lemma information in a cache before using lvg
  LemmaCacheFileLocation - the location of the cache file 
  LemmaCacheFileFrequencyCutoff - 

Note: as distributed, PostLemmas is set to false.  This is done to reduce the size of the CAS.
Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma annotations added to the CAS.