SimpleNaiveBayesClassifier (Lucene 7.4.0 API)

java.lang.Object
- org.apache.lucene.classification.SimpleNaiveBayesClassifier

All Implemented Interfaces:

Classifier<BytesRef>

Direct Known Subclasses:

CachingNaiveBayesClassifier, SimpleNaiveBayesDocumentClassifier
```
public class SimpleNaiveBayesClassifier
extends Object
implements Classifier<BytesRef>
```
A simplistic Lucene based NaiveBayes classifier, see http://en.wikipedia.org/wiki/Naive_Bayes_classifier

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary

Fields
Modifier and Type	Field and Description
`protected Analyzer`	`analyzer` `Analyzer` to be used for tokenizing unseen input text
`protected String`	`classFieldName` name of the field to be used as a class / category output
`protected IndexReader`	`indexReader` `IndexReader` used to access the `Classifier`'s index
`protected IndexSearcher`	`indexSearcher` `IndexSearcher` to run searches on the index for retrieving frequencies
`protected Query`	`query` `Query` used to eventually filter the document set to be used to classify
`protected String[]`	`textFieldNames` names of the fields to be used as input text

Constructor Summary

Constructors
Constructor and Description
`SimpleNaiveBayesClassifier(IndexReader indexReader, Analyzer analyzer, Query query, String classFieldName, String... textFieldNames)` Creates a new NaiveBayes classifier.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`ClassificationResult<BytesRef>`	`assignClass(String inputDocument)` Assign a class (with score) to the given text String
`protected List<ClassificationResult<BytesRef>>`	`assignClassNormalizedList(String inputDocument)` Calculate probabilities for all classes for a given input text
`protected int`	`countDocsWithClass()` count the number of documents in the index having at least a value for the 'class' field
`List<ClassificationResult<BytesRef>>`	`getClasses(String text)` Get all the classes (sorted by score, descending) assigned to the given text String.
`List<ClassificationResult<BytesRef>>`	`getClasses(String text, int max)` Get the first `max` classes (sorted by score, descending) assigned to the given text String.
`protected ArrayList<ClassificationResult<BytesRef>>`	`normClassificationResults(List<ClassificationResult<BytesRef>> assignedClasses)` Normalize the classification results based on the max score available
`protected String[]`	`tokenize(String text)` tokenize a `String` on this classifier's text fields and analyzer

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - indexReader
```
protected final IndexReader indexReader
```
    IndexReader used to access the Classifier's index
  - textFieldNames
```
protected final String[] textFieldNames
```
    names of the fields to be used as input text
  - classFieldName
```
protected final String classFieldName
```
    name of the field to be used as a class / category output
  - analyzer
```
protected final Analyzer analyzer
```
    Analyzer to be used for tokenizing unseen input text
  - indexSearcher
```
protected final IndexSearcher indexSearcher
```
    IndexSearcher to run searches on the index for retrieving frequencies
  - query
```
protected final Query query
```
    Query used to eventually filter the document set to be used to classify
- Constructor Detail
  - SimpleNaiveBayesClassifier
```
public SimpleNaiveBayesClassifier(IndexReader indexReader,
                                  Analyzer analyzer,
                                  Query query,
                                  String classFieldName,
                                  String... textFieldNames)
```
    Creates a new NaiveBayes classifier.
    
    Parameters:
    
    indexReader - the reader on the index to be used for classification
    
    analyzer - an Analyzer used to analyze unseen text
    
    query - a Query to eventually filter the docs used for training the classifier, or null if all the indexed docs should be used
    
    classFieldName - the name of the field used as the output for the classifier NOTE: must not be havely analyzed as the returned class will be a token indexed for this field
    
    textFieldNames - the name of the fields used as the inputs for the classifier, NO boosting supported per field
- Method Detail
  - assignClass
```
public ClassificationResult<BytesRef> assignClass(String inputDocument)
                                           throws IOException
```
    Description copied from interface: Classifier
    
    Assign a class (with score) to the given text String
    
    Specified by:
    
    assignClass in interface Classifier<BytesRef>
    
    Parameters:
    
    inputDocument - a String containing text to be classified
    
    Returns:
    
    a ClassificationResult holding assigned class of type T and score
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - getClasses
```
public List<ClassificationResult<BytesRef>> getClasses(String text)
                                                throws IOException
```
    Description copied from interface: Classifier
    
    Get all the classes (sorted by score, descending) assigned to the given text String.
    
    Specified by:
    
    getClasses in interface Classifier<BytesRef>
    
    Parameters:
    
    text - a String containing text to be classified
    
    Returns:
    
    the whole list of ClassificationResult, the classes and scores. Returns null if the classifier can't make lists.
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - getClasses
```
public List<ClassificationResult<BytesRef>> getClasses(String text,
                                                       int max)
                                                throws IOException
```
    Description copied from interface: Classifier
    
    Get the first max classes (sorted by score, descending) assigned to the given text String.
    
    Specified by:
    
    getClasses in interface Classifier<BytesRef>
    
    Parameters:
    
    text - a String containing text to be classified
    
    max - the number of return list elements
    
    Returns:
    
    the whole list of ClassificationResult, the classes and scores. Cut for "max" number of elements. Returns null if the classifier can't make lists.
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - assignClassNormalizedList
```
protected List<ClassificationResult<BytesRef>> assignClassNormalizedList(String inputDocument)
                                                                  throws IOException
```
    Calculate probabilities for all classes for a given input text
    
    Parameters:
    
    inputDocument - the input text as a String
    
    Returns:
    
    a List of ClassificationResult, one for each existing class
    
    Throws:
    
    IOException - if assigning probabilities fails
  - countDocsWithClass
```
protected int countDocsWithClass()
                          throws IOException
```
    count the number of documents in the index having at least a value for the 'class' field
    
    Returns:
    
    the no. of documents having a value for the 'class' field
    
    Throws:
    
    IOException - if accessing to term vectors or search fails
  - tokenize
```
protected String[] tokenize(String text)
                     throws IOException
```
    tokenize a String on this classifier's text fields and analyzer
    
    Parameters:
    
    text - the String representing an input text (to be classified)
    
    Returns:
    
    a String array of the resulting tokens
    
    Throws:
    
    IOException - if tokenization fails
  - normClassificationResults
```
protected ArrayList<ClassificationResult<BytesRef>> normClassificationResults(List<ClassificationResult<BytesRef>> assignedClasses)
```
    Normalize the classification results based on the max score available
    
    Parameters:
    
    assignedClasses - the list of assigned classes
    
    Returns:
    
    the normalized results

Class SimpleNaiveBayesClassifier

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

indexReader

textFieldNames

classFieldName

analyzer

indexSearcher

query

Constructor Detail

SimpleNaiveBayesClassifier

Method Detail

assignClass

getClasses

getClasses

assignClassNormalizedList

countDocsWithClass

tokenize

normClassificationResults