org.apache.ctakes.smokingstatus.MLutil
Class GenerateTrainingData
java.lang.Object
org.apache.ctakes.smokingstatus.MLutil.GenerateTrainingData
public class GenerateTrainingData
- extends java.lang.Object
Field Summary |
(package private) java.util.List<java.util.List<java.lang.Comparable>> |
features
|
(package private) java.util.Set<java.lang.String> |
keywords
|
(package private) java.util.Set<java.lang.String> |
stopwords
|
Constructor Summary |
GenerateTrainingData(java.lang.String keywordsFileName,
java.lang.String stopwordsFileName)
|
Method Summary |
static void |
main(java.lang.String[] args)
keywordsFile and stopwordsFile must point to the files in the resources
dataFile is your own sentence-level data in the format (class_label: C,
P, S): sentence|class_label sentence|class_label ... |
void |
makeFeatures(java.lang.String fname)
set "features" - list of features. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
keywords
java.util.Set<java.lang.String> keywords
stopwords
java.util.Set<java.lang.String> stopwords
features
java.util.List<java.util.List<java.lang.Comparable>> features
GenerateTrainingData
GenerateTrainingData(java.lang.String keywordsFileName,
java.lang.String stopwordsFileName)
makeFeatures
public void makeFeatures(java.lang.String fname)
- set "features" - list of features. Each list is a feature vector (the
last element is the class label)
format of input (fname): sentence|class_label (class_label: C, P, S)
main
public static void main(java.lang.String[] args)
- keywordsFile and stopwordsFile must point to the files in the resources
dataFile is your own sentence-level data in the format (class_label: C,
P, S): sentence|class_label sentence|class_label ... libsvmDataFile is
the file to write libsvm data