org.apache.ctakes.utils.wiki
Class WikiIndex

java.lang.Object
  extended by org.apache.ctakes.utils.wiki.WikiIndex

public class WikiIndex
extends java.lang.Object

A wrapper for a wikipedia lucene index.

Author:
dmitriy dligach

Field Summary
static java.lang.String defaultIndexPath
           
static int defaultMaxHits
           
static java.lang.String defaultSearchField
           
 
Constructor Summary
WikiIndex()
           
WikiIndex(int maxHits, java.lang.String indexPath, java.lang.String searchField)
           
WikiIndex(int maxHits, java.lang.String indexPath, java.lang.String searchField, boolean approximate)
           
 
Method Summary
 void close()
           
 double getCosineSimilarity(java.lang.String queryText1, java.lang.String queryText2)
          Send two queries to the index.
 java.util.ArrayList<org.apache.lucene.index.Terms> getTermFreqVectors(java.lang.String queryString)
           
 void initialize()
           
 java.util.ArrayList<SearchResult> search(java.lang.String queryText)
          Search the index.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

defaultMaxHits

public static int defaultMaxHits

defaultIndexPath

public static java.lang.String defaultIndexPath

defaultSearchField

public static java.lang.String defaultSearchField
Constructor Detail

WikiIndex

public WikiIndex(int maxHits,
                 java.lang.String indexPath,
                 java.lang.String searchField,
                 boolean approximate)

WikiIndex

public WikiIndex(int maxHits,
                 java.lang.String indexPath,
                 java.lang.String searchField)

WikiIndex

public WikiIndex()
Method Detail

initialize

public void initialize()
                throws org.apache.lucene.index.CorruptIndexException,
                       java.io.IOException
Throws:
org.apache.lucene.index.CorruptIndexException
java.io.IOException

search

public java.util.ArrayList<SearchResult> search(java.lang.String queryText)
                                         throws org.apache.lucene.queryparser.classic.ParseException,
                                                java.io.IOException
Search the index. Return a list of article titles and their scores.

Throws:
org.apache.lucene.queryparser.classic.ParseException
java.io.IOException

getCosineSimilarity

public double getCosineSimilarity(java.lang.String queryText1,
                                  java.lang.String queryText2)
                           throws org.apache.lucene.queryparser.classic.ParseException,
                                  java.io.IOException
Send two queries to the index. For each query, form a tfidf vector that represents N top matching documents. Return cosine similarity between the two tfidf vectors.

Throws:
org.apache.lucene.queryparser.classic.ParseException
java.io.IOException

getTermFreqVectors

public java.util.ArrayList<org.apache.lucene.index.Terms> getTermFreqVectors(java.lang.String queryString)
                                                                      throws org.apache.lucene.queryparser.classic.ParseException,
                                                                             java.io.IOException
Throws:
org.apache.lucene.queryparser.classic.ParseException
java.io.IOException

close

public void close()
           throws java.io.IOException
Throws:
java.io.IOException