Overloaded. Adds a document to this index, using the provided analyzer instead of the value of {@link #GetAnalyzer()}. If the document contains more than {@link #SetMaxFieldLength(int)} terms for a given field, the remainder are discarded.
Get the current setting of whether to use the compound file format. Note that this just returns the value you set with setUseCompoundFile(boolean) or the default. You cannot use this to query the status of an existing index.
The maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory. Note that this effectively truncates large documents, excluding from the index terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError. By default, no more than 10,000 terms will be indexed for a field.
Expert: Set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms. This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost. In particular,
numUniqueTerms/interval
terms are read into memory by an IndexReader, and, on average,
interval/2
terms must be scanned for each random term access.
Setting to turn on usage of a compound file. When on, multiple files for each segment are merged into a single file once the segment creation is finished. This is done regardless of what directory is in use.