IndexWriter Class

An

CopyC#

IndexWriter

creates and maintains an index.

The

CopyC#

create

argument to the {@link #IndexWriter(Directory, Analyzer, boolean) constructor} determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with

CopyC#

create=true

even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. There are also {@link #IndexWriter(Directory, Analyzer) constructors} with no

CopyC#

create

argument which will create a new index if there is not already an index at the provided path and otherwise open the existing index.

In either case, documents are added with {@link #AddDocument(Document) addDocument} and removed with {@link #DeleteDocuments(Term)} or {@link #DeleteDocuments(Query)}. A document can be updated with {@link #UpdateDocument(Term, Document) updateDocument} (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, {@link #Close() close} should be called.

These changes are buffered in memory and periodically flushed to the {@link Directory} (during the above method calls). A flush is triggered when there are enough buffered deletes (see {@link #setMaxBufferedDeleteTerms}) or enough added documents since the last flush, whichever is sooner. For the added documents, flushing is triggered either by RAM usage of the documents (see {@link #setRAMBufferSizeMB}) or the number of added documents. The default is to flush when RAM usage hits 16 MB. For best indexing speed you should flush by RAM usage with a large RAM buffer. Note that flushing just moves the internal buffered state in IndexWriter into the index, but these changes are not visible to IndexReader until either {@link #Commit()} or {@link #close} is called. A flush may also trigger one or more segment merges which by default run with a background thread so as not to block the addDocument calls (see below for changing the {@link MergeScheduler}).

The optional

CopyC#

autoCommit

argument to the {@link #IndexWriter(Directory, boolean, Analyzer) constructors} controls visibility of the changes to {@link IndexReader} instances reading the same index. When this is

CopyC#

false

, changes are not visible until {@link #Close()} or {@link #Commit()} is called. Note that changes will still be flushed to the {@link Directory} as new files, but are not committed (no new

CopyC#

segments_N

file is written referencing the new files, nor are the files sync'd to stable storage) until {@link #Close()} or {@link #Commit()} is called. If something goes terribly wrong (for example the JVM crashes), then the index will reflect none of the changes made since the last commit, or the starting state if commit was not called. You can also call {@link #Rollback()}, which closes the writer without committing any changes, and removes any index files that had been flushed but are now unreferenced. This mode is useful for preventing readers from refreshing at a bad time (for example after you've done all your deletes but before you've done your adds). It can also be used to implement simple single-writer transactional semantics ("all or none"). You can do a two-phase commit by calling {@link #PrepareCommit()} followed by {@link #Commit()}. This is necessary when Lucene is working with an external resource (for example, a database) and both must either commit or rollback the transaction.

When

CopyC#

autoCommit

is

CopyC#

true

then the writer will periodically commit on its own. [Deprecated: Note that in 3.0, IndexWriter will no longer accept autoCommit=true (it will be hardwired to false). You can always call {@link #Commit()} yourself when needed]. There is no guarantee when exactly an auto commit will occur (it used to be after every flush, but it is now after every completed merge, as of 2.4). If you want to force a commit, call {@link #Commit()}, or, close the writer. Once a commit has finished, newly opened {@link IndexReader} instances will see the changes to the index as of that commit. When running in this mode, be careful not to refresh your readers while optimize or segment merges are taking place as this can tie up substantial disk space.

Regardless of

CopyC#

autoCommit

, an {@link IndexReader} or {@link Lucene.Net.Search.IndexSearcher} will only see the index as of the "point in time" that it was opened. Any changes committed to the index after the reader was opened are not visible until the reader is re-opened.

If an index will not have more documents added for a while and optimal search performance is desired, then either the full {@link #Optimize() optimize} method or partial {@link #Optimize(int)} method should be called before the index is closed.

Opening an

CopyC#