Class DatasetSplitter
- java.lang.Object
-
- org.apache.lucene.classification.utils.DatasetSplitter
-
public class DatasetSplitter extends Object
Utility class for creating training / test / cross validation indexes from the original index.
-
-
Constructor Summary
Constructors Constructor Description DatasetSplitter(double testRatio, double crossValidationRatio)
Create aDatasetSplitter
by giving test and cross validation IDXs sizes
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, String classFieldName, String... fieldNames)
Split a given index into 3 indexes for training, test and cross validation tasks respectively
-
-
-
Constructor Detail
-
DatasetSplitter
public DatasetSplitter(double testRatio, double crossValidationRatio)
Create aDatasetSplitter
by giving test and cross validation IDXs sizes- Parameters:
testRatio
- the ratio of the original index to be used for the test IDX as adouble
between 0.0 and 1.0crossValidationRatio
- the ratio of the original index to be used for the c.v. IDX as adouble
between 0.0 and 1.0
-
-
Method Detail
-
split
public void split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, String classFieldName, String... fieldNames) throws IOException
Split a given index into 3 indexes for training, test and cross validation tasks respectively- Parameters:
originalIndex
- anLeafReader
on the source indextrainingIndex
- aDirectory
used to write the training indextestIndex
- aDirectory
used to write the test indexcrossValidationIndex
- aDirectory
used to write the cross validation indexanalyzer
-Analyzer
used to create the new docstermVectors
-true
if term vectors should be keptclassFieldName
- name of the field used as the label for classification; this must be indexed with sorted doc valuesfieldNames
- names of fields that need to be put in the new indexes ornull
if all should be used- Throws:
IOException
- if any writing operation fails on any of the indexes
-
-