org.apache.ctakes.dictionary.assertion
Class CreateAssertionLuceneIndexFromDelimitedFile

java.lang.Object
  extended by org.apache.ctakes.dictionary.assertion.CreateAssertionLuceneIndexFromDelimitedFile

public class CreateAssertionLuceneIndexFromDelimitedFile
extends Object

Driver for populating a Lucene Index with assertion cue phrases, so that the tokenization of the dictionary entries matches the tokenization that will be done to clinical text during pipeline processing. Just as the pipeline can use a file of hyphenated words to control which words should be considered as a single token, the creation of the dictionary entries can use a file of hyphenated words so the dictionary entries are tokenized in the same way as the clinical text will be.


Field Summary
static String CUE_PHRASE_CATEGORY_FIELD_NAME
           
static String CUE_PHRASE_FAMILY_FIELD_NAME
           
static String CUE_PHRASE_FIELD_NAME
           
static String CUE_PHRASE_FIRST_WORD_FIELD_NAME
           
 
Constructor Summary
CreateAssertionLuceneIndexFromDelimitedFile(TokenizerPTB tokenizer)
          Constructor
 
Method Summary
static String getUsage()
           
static String load(String filename)
          Loads text from a file.
static Map loadHyphMap(String filename)
          Loads hyphenated words and a frequency value for each, from a file.
static void main(String[] args)
           
static void printResults(String text, List results)
          Prints out the tokenized results, for debug use.
 void writeToFile(String str)
           
protected  void writeToFormatLucene(String cuePhrase, String cuePhraseCategory, String cuePhraseFamily)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CUE_PHRASE_FIELD_NAME

public static final String CUE_PHRASE_FIELD_NAME
See Also:
Constant Field Values

CUE_PHRASE_CATEGORY_FIELD_NAME

public static final String CUE_PHRASE_CATEGORY_FIELD_NAME
See Also:
Constant Field Values

CUE_PHRASE_FAMILY_FIELD_NAME

public static final String CUE_PHRASE_FAMILY_FIELD_NAME
See Also:
Constant Field Values

CUE_PHRASE_FIRST_WORD_FIELD_NAME

public static final String CUE_PHRASE_FIRST_WORD_FIELD_NAME
See Also:
Constant Field Values
Constructor Detail

CreateAssertionLuceneIndexFromDelimitedFile

public CreateAssertionLuceneIndexFromDelimitedFile(TokenizerPTB tokenizer)
                                            throws Exception
Constructor

Parameters:
Tokenizer - Used to tokenize the dictionary entries
Throws:
Exception
Method Detail

main

public static void main(String[] args)

load

public static String load(String filename)
                   throws FileNotFoundException,
                          IOException
Loads text from a file.

Parameters:
filename -
Returns:
Throws:
FileNotFoundException
IOException

loadHyphMap

public static Map loadHyphMap(String filename)
                       throws FileNotFoundException,
                              IOException
Loads hyphenated words and a frequency value for each, from a file.

Parameters:
filename -
Returns:
Throws:
FileNotFoundException
IOException

printResults

public static void printResults(String text,
                                List results)
Prints out the tokenized results, for debug use.

Parameters:
text -
results -

getUsage

public static String getUsage()
Returns:
A string showing usage example (parameters)

writeToFormatLucene

protected void writeToFormatLucene(String cuePhrase,
                                   String cuePhraseCategory,
                                   String cuePhraseFamily)

writeToFile

public void writeToFile(String str)


Copyright © 2012-2013 The Apache Software Foundation. All Rights Reserved.