An creates and maintains an index.
The argument to the {@link
#IndexWriter(Directory, Analyzer, boolean) constructor} determines
whether a new index is created, or whether an existing index is
opened. Note that you can open an index with
even while readers are using the index. The old readers will
continue to search the "point in time" snapshot they had opened,
and won't see the newly created index until they re-open. There are
also {@link #IndexWriter(Directory, Analyzer) constructors}
with no argument which will create a new index
if there is not already an index at the provided path and otherwise
open the existing index.In either case, documents are added with {@link #AddDocument(Document)
addDocument} and removed with {@link #DeleteDocuments(Term)} or {@link
#DeleteDocuments(Query)}. A document can be updated with {@link
#UpdateDocument(Term, Document) updateDocument} (which just deletes
and then adds the entire document). When finished adding, deleting
and updating documents, {@link #Close() close} should be called.These changes are buffered in memory and periodically
flushed to the {@link Directory} (during the above method
calls). A flush is triggered when there are enough
buffered deletes (see {@link #setMaxBufferedDeleteTerms})
or enough added documents since the last flush, whichever
is sooner. For the added documents, flushing is triggered
either by RAM usage of the documents (see {@link
#setRAMBufferSizeMB}) or the number of added documents.
The default is to flush when RAM usage hits 16 MB. For
best indexing speed you should flush by RAM usage with a
large RAM buffer. Note that flushing just moves the
internal buffered state in IndexWriter into the index, but
these changes are not visible to IndexReader until either
{@link #Commit()} or {@link #close} is called. A flush may
also trigger one or more segment merges which by default
run with a background thread so as not to block the
addDocument calls (see below
for changing the {@link MergeScheduler}).The optional argument to the {@link
#IndexWriter(Directory, boolean, Analyzer) constructors}
controls visibility of the changes to {@link IndexReader}
instances reading the same index. When this is
, changes are not visible until {@link
#Close()} or {@link #Commit()} is called. Note that changes will still be
flushed to the {@link Directory} as new files, but are
not committed (no new file is written
referencing the new files, nor are the files sync'd to stable storage)
until {@link #Close()} or {@link #Commit()} is called. If something
goes terribly wrong (for example the JVM crashes), then
the index will reflect none of the changes made since the
last commit, or the starting state if commit was not called.
You can also call {@link #Rollback()}, which closes the writer
without committing any changes, and removes any index
files that had been flushed but are now unreferenced.
This mode is useful for preventing readers from refreshing
at a bad time (for example after you've done all your
deletes but before you've done your adds). It can also be
used to implement simple single-writer transactional
semantics ("all or none"). You can do a two-phase commit
by calling {@link #PrepareCommit()}
followed by {@link #Commit()}. This is necessary when
Lucene is working with an external resource (for example,
a database) and both must either commit or rollback the
transaction.When is then
the writer will periodically commit on its own. [Deprecated: Note that in 3.0, IndexWriter will
no longer accept autoCommit=true (it will be hardwired to
false). You can always call {@link #Commit()} yourself
when needed]. There is
no guarantee when exactly an auto commit will occur (it
used to be after every flush, but it is now after every
completed merge, as of 2.4). If you want to force a
commit, call {@link #Commit()}, or, close the writer. Once
a commit has finished, newly opened {@link IndexReader} instances will
see the changes to the index as of that commit. When
running in this mode, be careful not to refresh your
readers while optimize or segment merges are taking place
as this can tie up substantial disk space.
CopyC#
IndexWriter
CopyC#
create
CopyC#
create=true
CopyC#
create
CopyC#
autoCommit
CopyC#
false
CopyC#
segments_N
CopyC#
autoCommit
CopyC#
true
Regardless of , an {@link
IndexReader} or {@link Lucene.Net.Search.IndexSearcher} will only see the
index as of the "point in time" that it was opened. Any
changes committed to the index after the reader was opened
are not visible until the reader is re-opened.If an index will not have more documents added for a while and optimal search
performance is desired, then either the full {@link #Optimize() optimize}
method or partial {@link #Optimize(int)} method should be
called before the index is closed.Opening an creates a lock file for the directory in use. Trying to open
another on the same directory will lead to a
{@link LockObtainFailedException}. The {@link LockObtainFailedException}
is also thrown if an IndexReader on the same directory is used to delete documents
from the index.
CopyC#
autoCommit
CopyC#
IndexWriter
CopyC#
IndexWriter
Expert: allows an optional
{@link IndexDeletionPolicy} implementation to be
specified. You can use this to control when prior commits
are deleted from the index. The default policy is {@link
KeepOnlyLastCommitDeletionPolicy} which removes all prior
commits as soon as a new commit is done (this matches
behavior before 2.2). Creating your own policy can allow
you to explicitly keep previous "point in time" commits
alive in the index for some time, to allow readers to
refresh to the new commit without having the old commit
deleted out from under them. This is necessary on
filesystems like NFS that do not support "delete on last
close" semantics, which Lucene's "point in time" search
normally relies on. Expert:
allows you to separately change
the {@link MergePolicy} and the {@link MergeScheduler}.
The {@link MergePolicy} is invoked whenever there are
changes to the segments in the index. Its role is to
select which merges to do, if any, and return a {@link
MergePolicy.MergeSpecification} describing the merges. It
also selects merges to do for optimize(). (The default is
{@link LogByteSizeMergePolicy}. Then, the {@link
MergeScheduler} is invoked with the requested merges and
it decides when and how to run the merges. The default is
{@link ConcurrentMergeScheduler}. NOTE: if you hit an
OutOfMemoryError then IndexWriter will quietly record this
fact and block all future segment commits. This is a
defensive measure in case any internal state (buffered
documents and deletions) were corrupted. Any subsequent
calls to {@link #Commit()} will throw an
IllegalStateException. The only course of action is to
call {@link #Close()}, which internally will call {@link
#Rollback()}, to undo any changes to the index since the
last commit. If you opened the writer with autoCommit
false you can also just call {@link #Rollback()}
directly.NOTE: {@link
} instances are completely thread
safe, meaning multiple threads can call any of its
methods, concurrently. If your application requires
external synchronization, you should not
synchronize on the instance as
this may cause deadlock; use your own (non-Lucene) objects
instead.
CopyC#
IndexWriter
CopyC#
IndexWriter
CopyC#
IndexWriter
CopyC#
IndexWriter
The IndexWriter..::..MaxFieldLength type exposes the following members.
Constructors
Name | Description | |
---|---|---|
IndexWriter..::..MaxFieldLength | Public constructor to allow users to specify the maximum field size limit.
|
Methods
Name | Description | |
---|---|---|
Equals | (Inherited from Object.) | |
Finalize | Allows an Object to attempt to free resources and perform other cleanup operations before the Object is reclaimed by garbage collection. (Inherited from Object.) | |
GetHashCode | Serves as a hash function for a particular type. (Inherited from Object.) | |
GetLimit | ||
GetType | Gets the Type of the current instance. (Inherited from Object.) | |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object.) | |
ToString | (Overrides Object..::..ToString()()()().) |
Fields
Name | Description | |
---|---|---|
LIMITED | Sets the maximum field length to
{@link #DEFAULT_MAX_FIELD_LENGTH}
| |
UNLIMITED | Sets the maximum field length to {@link Integer#MAX_VALUE}. |