org.apache.ctakes.smokingstatus.MLutil
Class GenerateTrainingData

java.lang.Object
  extended by org.apache.ctakes.smokingstatus.MLutil.GenerateTrainingData

public class GenerateTrainingData
extends java.lang.Object


Field Summary
(package private)  java.util.List<java.util.List<java.lang.Comparable>> features
           
(package private)  java.util.Set<java.lang.String> keywords
           
(package private)  java.util.Set<java.lang.String> stopwords
           
 
Constructor Summary
GenerateTrainingData(java.lang.String keywordsFileName, java.lang.String stopwordsFileName)
           
 
Method Summary
static void main(java.lang.String[] args)
          keywordsFile and stopwordsFile must point to the files in the resources dataFile is your own sentence-level data in the format (class_label: C, P, S): sentence|class_label sentence|class_label ...
 void makeFeatures(java.lang.String fname)
          set "features" - list of features.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

keywords

java.util.Set<java.lang.String> keywords

stopwords

java.util.Set<java.lang.String> stopwords

features

java.util.List<java.util.List<java.lang.Comparable>> features
Constructor Detail

GenerateTrainingData

GenerateTrainingData(java.lang.String keywordsFileName,
                     java.lang.String stopwordsFileName)
Method Detail

makeFeatures

public void makeFeatures(java.lang.String fname)
set "features" - list of features. Each list is a feature vector (the last element is the class label) format of input (fname): sentence|class_label (class_label: C, P, S)


main

public static void main(java.lang.String[] args)
keywordsFile and stopwordsFile must point to the files in the resources dataFile is your own sentence-level data in the format (class_label: C, P, S): sentence|class_label sentence|class_label ... libsvmDataFile is the file to write libsvm data