=== Lucene Status Report: December, 2009 === TLP -The PMC added George Aroush and Chris Mattmann to the PMC -The PMC added Open Relevance committer Robert Muir -The PMC added Mahout committer Jake Mannix -The PMC added Tika committer Ken Krugler LUCENE JAVA Lucene Java is a search-engine toolkit. Development has been active and we released both 2.9 and 3.0 this quarter SOLR Solr is a full text search server using Lucene Java. Development and the community is active. Solr released version 1.4 this quarter. NUTCH Nutch is a web-search engine: crawler, indexer and search runtime. There has been a recent flurry of work on discussing Nutch's future post ApacheCon, spearheaded by Andrzej Bialecki and others. In addition, there is ongoing work on reducing code duplication (tighter integration of the Tika parsing framework and mime type detection, better Solr integration) and using a more flexible storage system (e.g. HBase). Many issues are being fixed in preparation for a 1.1 release early next quarter. LUCY Lucy is a loose C port of Lucene targeted at dynamic language bindings. Development this quarter has focused on abstraction of the IO subsystem and portability to various compiler platforms. LUCENE.NET Lucene.NET is a .NET based port of Lucene Java. Development and the community are active. Lucene.NET graduated from the incubator and is now a full-fledged Lucene sub-project. Mahout Apache Mahout is working towards building a suite of scalable machine learning libraries for text and data mining. Development is active and version 0.2 was released this quarter. Open Relevance Project The Open Relevance Project is a new project aimed at providing Lucene and others tools for judging the quality of search and machine learning approaches. The project added Robert Muir as a committer this quarter and development is getting under way. Recent work has added support for Indonesian "Tempo" and Persian "Hamshahri" collection to execute relevance judgements with lucene-benchmark. PyLucene PyLucene is a Python integration of Lucene Java. Development is active. Closely tracking the Lucene Java releases, we released PyLucene 2.9.0, PyLucene 2.9.1 and PyLucene 3.0.0 this quarter. A major addition was made to JCC, the code generator making PyLucene possible: the support for Java generics now in use by Lucene Java 3.0. TIKA Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika released version 0.5 this quarter. There have been recent development efforts to speed up Tika's mime detector, as well as efforts to provide a self-contained OGSI-based Tika bundle. There is a strong desire to release these post 0.5 improvements, so we are planning to release Tika 0.6 in the next few weeks.