Class | Description |
---|---|
ByteBlockPool | Class that Posting and PostingVector use to write byte streams into shared fixed-size byte[] arrays. The idea is to allocate slices of increasing lengths For example, the first slice is 5 bytes, the next slice is 14, etc. We start by writing our bytes into the first 5 bytes. When we hit the end of the slice, we allocate the next slice and then write the address of the new slice into the last 4 bytes of the previous slice (the "forwarding address"). Each slice is filled with 0's initially, and we mark the end with a non-zero byte. This way the methods that are writing into the slice don't need to record its length and instead allocate a new slice once they hit a non-zero byte. |
ByteBlockPool.Allocator | |
ByteSliceReader | IndexInput that knows how to read the byte slices written by Posting and PostingVector. We read the bytes in each slice until we hit the end of that slice at which point we read the forwarding address of the next slice and then jump to it. |
ByteSliceWriter | Class to write byte streams into slices of shared byte[]. This is used by DocumentsWriter to hold the posting list for many terms in RAM. |
CheckIndex | |
CheckIndex.Status | |
CheckIndex.Status.SegmentInfoStatus | |
CompoundFileReader | Class for accessing a compound stream. This class implements a directory, but is limited to only read operations. Directory methods that would normally modify data throw an exception. |
CompoundFileReader.CSIndexInput | Implementation of an IndexInput that reads from a portion of the compound file. The visibility is left as "package" *only* because this helps with testing since JUnit test cases in a different class can then access package fields of this class. |
CompoundFileWriter | |
ConcurrentMergeScheduler | A {@link MergeScheduler} that runs each merge using a separate thread, up until a maximum number of threads ({@link #setMaxThreadCount}) at which when a merge is needed, the thread(s) that are updating the index will pause until one or more merges completes. This is a simple way to use concurrency in the indexing process without having to create and manage application level threads. |
ConcurrentMergeScheduler.MergeThread | |
CorruptIndexException | This exception is thrown when Lucene detects an inconsistency in the index. |
DirectoryIndexReader | IndexReader implementation that has access to a Directory. Instances that have a SegmentInfos object (i. e. segmentInfos != null) "own" the directory, which means that they try to acquire a write lock whenever index modifications are performed. |
DocumentsWriter | |
FieldInfo | |
FieldInfos | Access to the Fieldable Info file that describes document fields and whether or not they are indexed. Each segment has a separate Fieldable Info file. objects of this class are thread-safe for multiple readers, but only one thread can be adding documents at a time, with no other reader or writer threads accessing this object. |
FieldReaderException | |
FieldSortedTermVectorMapper | For each Field, store a sorted collection of {@link TermVectorEntry}s This is not thread-safe. |
FieldsReader | Class responsible for access to stored document fields. It uses <segment>.fdt and <segment>.fdx; files. |
FieldsReader.FieldForMerge | |
FilterIndexReader | A FilterIndexReadercontains another IndexReader, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality. The class FilterIndexReaderitself simply implements all abstract methods of IndexReaderwith versions that pass all requests to the contained index reader. Subclasses of FilterIndexReadermay further override some of these methods and may also provide additional methods and fields. |
FilterIndexReader.FilterTermDocs | Base class for filtering {@link TermDocs} implementations. |
FilterIndexReader.FilterTermEnum | Base class for filtering {@link TermEnum} implementations. |
FilterIndexReader.FilterTermPositions | Base class for filtering {@link TermPositions} implementations. |
IndexCommit | Deprecated. Please subclass Indexcommit class instead. |
IndexFileDeleter | |
IndexFileNameFilter | Filename filter that accept filenames and extensions only created by Lucene. |
IndexFileNames | Useful constants representing filenames and extensions used by lucene |
IndexModifier | |
IndexReader | |
IndexReader.FieldOption | Constants describing field properties, for example used for {@link IndexReader#GetFieldNames(FieldOption)}. |
IndexWriter | An IndexWritercreates and maintains an index. The createargument to the constructor determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with create=trueeven while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. There are also constructors with no createargument which will create a new index if there is not already an index at the provided path and otherwise open the existing index. In either case, documents are added with addDocument and removed with deleteDocuments(Term) or deleteDocuments(Query). A document can be updated with updateDocument (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, close should be called. These changes are buffered in memory and periodically flushed to the {@link Directory} (during the above method calls). A flush is triggered when there are enough buffered deletes (see {@link #setMaxBufferedDeleteTerms}) or enough added documents since the last flush, whichever is sooner. For the added documents, flushing is triggered either by RAM usage of the documents (see {@link #setRAMBufferSizeMB}) or the number of added documents. The default is to flush when RAM usage hits 16 MB. For best indexing speed you should flush by RAM usage with a large RAM buffer. Note that flushing just moves the internal buffered state in IndexWriter into the index, but these changes are not visible to IndexReader until either {@link #Commit()} or {@link #close} is called. A flush may also trigger one or more segment merges which by default run with a background thread so as not to block the addDocument calls (see below for changing the {@link MergeScheduler}). The optional autoCommitargument to the constructors controls visibility of the changes to {@link IndexReader} instances reading the same index. When this is false, changes are not visible until {@link #Close()} or {@link #Commit()} is called. Note that changes will still be flushed to the {@link org.apache.lucene.store.Directory} as new files, but are not committed (no new segments_Nfile is written referencing the new files, nor are the files sync'd to stable storage) until {@link #Close()} or {@link #Commit()} is called. If something goes terribly wrong (for example the JVM crashes), then the index will reflect none of the changes made since the last commit, or the starting state if commit was not called. You can also call {@link #rollback}, which closes the writer without committing any changes, and removes any index files that had been flushed but are now unreferenced. This mode is useful for preventing readers from refreshing at a bad time (for example after you've done all your deletes but before you've done your adds). It can also be used to implement simple single-writer transactional semantics ("all or none"). You can do a two-phase commit by calling {@link #PrepareCommit()} followed by {@link #Commit()}. This is necessary when Lucene is working with an external resource (for example, a database) and both must either commit or rollback the transaction. When autoCommitis truethen the writer will periodically commit on its own. [Deprecated: Note that in 3.0, IndexWriter will no longer accept autoCommit=true (it will be hardwired to false). You can always call {@link #Commit()} yourself when needed]. There is no guarantee when exactly an auto commit will occur (it used to be after every flush, but it is now after every completed merge, as of 2.4). If you want to force a commit, call {@link #Commit()}, or, close the writer. Once a commit has finished, newly opened {@link IndexReader} instances will see the changes to the index as of that commit. When running in this mode, be careful not to refresh your readers while optimize or segment merges are taking place as this can tie up substantial disk space. Regardless of autoCommit, an {@link IndexReader} or {@link org.apache.lucene.search.IndexSearcher} will only see the index as of the "point in time" that it was opened. Any changes committed to the index after the reader was opened are not visible until the reader is re-opened. If an index will not have more documents added for a while and optimal search performance is desired, then either the full optimize method or partial {@link #Optimize(int)} method should be called before the index is closed. Opening an IndexWritercreates a lock file for the directory in use. Trying to open another IndexWriteron the same directory will lead to a {@link LockObtainFailedException}. The {@link LockObtainFailedException} is also thrown if an IndexReader on the same directory is used to delete documents from the index. Expert: IndexWriterallows an optional {@link IndexDeletionPolicy} implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy is {@link KeepOnlyLastCommitDeletionPolicy} which removes all prior commits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on. Expert: IndexWriterallows you to separately change the {@link MergePolicy} and the {@link MergeScheduler}. The {@link MergePolicy} is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a {@link MergePolicy.MergeSpecification} describing the merges. It also selects merges to do for Optimize(). (The default is {@link LogByteSizeMergePolicy}. Then, the {@link MergeScheduler} is invoked with the requested merges and it decides when and how to run the merges. The default is {@link ConcurrentMergeScheduler}. |
IndexWriter.MaxFieldLength | |
KeepOnlyLastCommitDeletionPolicy | This {@link IndexDeletionPolicy} implementation that keeps only the most recent commit and immediately removes all prior commits after a new commit is done. This is the default deletion policy. |
LogByteSizeMergePolicy | This is a {@link LogMergePolicy} that measures size of a segment as the total byte size of the segment's files. |
LogDocMergePolicy | This is a {@link LogMergePolicy} that measures size of a segment as the number of documents (not taking deletions into account). |
LogMergePolicy | |
MergePolicy | Expert: a MergePolicy determines the sequence of primitive merge operations to be used for overall merge and optimize operations. Whenever the segments in an index have been altered by {@link IndexWriter}, either the addition of a newly flushed segment, addition of many segments from addIndexes* calls, or a previous merge that may now need to cascade, {@link IndexWriter} invokes {@link #findMerges} to give the MergePolicy a chance to pick merges that are now required. This method returns a {@link MergeSpecification} instance describing the set of merges that should be done, or null if no merges are necessary. When IndexWriter.optimize is called, it calls {@link #findMergesForOptimize} and the MergePolicy should then return the necessary merges. Note that the policy can return more than one merge at a time. In this case, if the writer is using {@link SerialMergeScheduler}, the merges will be run sequentially but if it is using {@link ConcurrentMergeScheduler} they will be run concurrently. The default MergePolicy is {@link LogByteSizeMergePolicy}. NOTE: This API is new and still experimental (subject to change suddenly in the next release) |
MergePolicy.MergeAbortedException | |
MergePolicy.MergeException | Exception thrown if there are any problems while executing a merge. |
MergePolicy.MergeSpecification | A MergeSpecification instance provides the information necessary to perform multiple merges. It simply contains a list of {@link OneMerge} instances. |
MergePolicy.OneMerge | OneMerge provides the information necessary to perform an individual primitive merge operation, resulting in a single new segment. The merge spec includes the subset of segments to be merged as well as whether the new segment should use the compound file format. |
MergeScheduler | Expert: {@link IndexWriter} uses an instance implementing this interface to execute the merges selected by a {@link MergePolicy}. The default MergeScheduler is {@link ConcurrentMergeScheduler}. NOTE: This API is new and still experimental (subject to change suddenly in the next release) |
MultipleTermPositions | Describe class MultipleTermPositionshere. |
MultiReader | An IndexReader which reads multiple indexes, appending their content. |
MultiSegmentReader | An IndexReader which reads indexes with multiple segments. |
ParallelReader | |
Payload | |
PositionBasedTermVectorMapper | For each Field, store position by position information. It ignores frequency information This is not thread-safe. |
PositionBasedTermVectorMapper.TVPositionInfo | Container for a term at a position |
SegmentInfo | |
SegmentInfos | |
SegmentInfos.FindSegmentsFile | Utility class for executing code that needs to do something with the current segments file. This is necessary with lock-less commits because from the time you locate the current segments file name, until you actually open it, read its contents, or check modified time, etc., it could have been deleted due to a writer commit finishing. |
SegmentMerger | |
SegmentReader | |
SegmentTermDocs | |
SegmentTermEnum | |
SegmentTermPositions | |
SegmentTermVector | |
SerialMergeScheduler | A {@link MergeScheduler} that simply does each merge sequentially, using the current thread. |
SnapshotDeletionPolicy | A {@link IndexDeletionPolicy} that wraps around any other {@link IndexDeletionPolicy} and adds the ability to hold and later release a single "snapshot" of an index. While the snapshot is held, the {@link IndexWriter} will not remove any files associated with it even if the index is otherwise being actively, arbitrarily changed. Because we wrap another arbitrary {@link IndexDeletionPolicy}, this gives you the freedom to continue using whatever {@link IndexDeletionPolicy} you would normally want to use with your index. Note that you can re-use a single instance of SnapshotDeletionPolicy across multiple writers as long as they are against the same index Directory. Any snapshot held when a writer is closed will "survive" when the next writer is opened. WARNING: This API is new and experimental and may suddnely changendex. |
SortedTermVectorMapper | Store a sorted collection of {@link Lucene.Net.Index.TermVectorEntry}s. Collects all term information into a single, SortedSet. NOTE: This Mapper ignores all Field information for the Document. This means that if you are using offset/positions you will not know what Fields they correlate with. This is not thread-safe |
StaleReaderException | This exception is thrown when an {@link IndexReader} tries to make changes to the index (via {@link IndexReader#deleteDocument}, {@link IndexReader#undeleteAll} or {@link IndexReader#setNorm}) but changes have already been committed to the index since this reader was instantiated. When this happens you must open a new reader on the current index to make the changes. |
Term | A Term represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that the text occured in, an interned string. Note that terms may represent more than words from text fields, but also things like dates, email addresses, urls, etc. |
TermEnum | |
TermInfo | A TermInfo is the record of information stored for a term. |
TermInfosReader | |
TermInfosWriter | |
TermVectorEntry | Convenience class for holding TermVector information. |
TermVectorEntryFreqSortedComparator | Compares {@link Lucene.Net.Index.TermVectorEntry}s first by frequency and then by the term (case-sensitive) |
TermVectorMapper | The TermVectorMapper can be used to map Term Vectors into your own structure instead of the parallel array structure used by {@link Lucene.Net.Index.IndexReader#GetTermFreqVector(int,String)}. It is up to the implementation to make sure it is thread-safe. |
TermVectorOffsetInfo | The TermVectorOffsetInfo class holds information pertaining to a Term in a {@link Lucene.Net.Index.TermPositionVector}'s offset information. This offset information is the character offset as set during the Analysis phase (and thus may not be the actual offset in the original content). |
TermVectorsReader | |
TermVectorsWriter |
Interface | Description |
---|---|
IndexCommitPoint | |
IndexDeletionPolicy | |
TermDocs | |
TermFreqVector | Provides access to stored term vector of a document field. The vector consists of the name of the field, an array of the terms tha occur in the field of the {@link Lucene.Net.Documents.Document} and a parallel array of frequencies. Thus, getTermFrequencies()[5] corresponds with the frequency of getTerms()[5], assuming there are at least 5 terms in the Document. |
TermPositions | |
TermPositionVector | Extends TermFreqVectorto provide additional information about positions in which each of the terms is found. A TermPositionVector not necessarily contains both positions and offsets, but at least one of these arrays exists. |