Jakarta Lucene - Lucene TODO List

About

Resources

Download

Jakarta

Purpose

This document describes the list of tasks on the plates of the Lucene development team. Tasks are assigned into two categories: core or non-core.

About Core vs. Non-Core Development

Currently the Lucene development team is working on categorizing change requests into core and non-core changes.

Core changes would entail a change to the search engine core itself. From Doug Cutting:
"Examples include: file locking to make things multi-process safe; adding an API for boosting individual documents and fields values; making the scoring API extensible and public; etc."

Non-core changes would not affect the search engine itself, but would consist instead of projects or components that would make useful additions to the core framework. Again, from Doug Cutting:
"[Examples] include: support for more languages; query parsers; database storage; crawlers, etc. Whether these belong in the base distribution is a matter of debate (sometimes hot). My rule of thumb for including them is their generality: if they are likely to be useful to a large proportion of Lucene users then they should probably go in the base distribution. Language support in particular is tricky. Perhaps we should migrate to a model where the base distribution includes no analyzers, and supply separate language packages."

Change requests will be categorically defined by the development team (committers) as core or non-core, and a committer will be assigned responsibility for coordinating development of the change request. All change requests should be submitted to one of the Lucene mailing lists, or through the Apache Bugzilla database.

Core Development Changes

No change requests classified as core yet!

Non-Core Development Changes

No change requests classified as non-core yet!

Unclassified Changes

Name Description Links

Term Vector support

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgNo=273

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgNo=272

Support for Search Term Highlighting

http://www.geocrawler.org/archives/3/2624/2001/9/50/6553088/

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=115271

http://www.iq-computing.de/index.asp?menu=projekte-lucene-highlight

http://nagoya.apache.org/eyebrowse/BrowseList?listName=lucene-dev@jakarta.apache.org&by=thread&from=56403

Better support for hits sorted by things other than score. An easy, efficient case is to support results sorted by the order documents were added to the index. A little harder and less efficient is support for results sorted by an arbitrary field.

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=114756

http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00228.html

Add some requested methods: IndexReader.getIndexedFields String[] IndexReader.getIndexedFields();

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=330010

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=330009

Add lastModified() method to Directory, FSDirectory and RamDirectory, so it could be cached in IndexWriter/Searcher manager.

Support for adding more than 1 term to the same position. N.B. I think the Finnish lady already implemented this. It required some pieces of Lucene to be modified. (OG).

The ability to retrieve the number of occurrences not only for a term but also for a Phrase.

http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00101.html

Che Dong's CJKTokenizer for Chinese, Japanese, and Korean.

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgId=330905

Selecting a language-specific analyzer according to a locale. Now we rewrite parts of Lucene code in order to use another analyzer. It will be useful to select analyzer without touching code.

Adding "-encoding" option and encoding-sensitive methods to tools. Current tools needs minor changes on a Japanese (and other language) environment: adding an "-encode" option and argument, using Reader/Writer classes instead of InputStream/OutputStream classes, etc.