=== Lucene Status Report: December, 2008 === TLP The TLP has added Lucene Java committer Michael Busch to the PMC. CRYPTOGRAPHY Nutch uses PDFBox and thus has a dependency on BouncyCastle. https://issues.apache.org/jira/browse/NUTCH-621 is now closed. LUCENE JAVA Lucene Java is a search-engine toolkit. Development has been active and we are working towards the release of 2.9. Uwe Schindler has been added as a contrib committer. SOLR Solr is a full text search server. Development and the community is active. Solr 1.3 was released on September 15, 2008. NUTCH Nutch is a web-search engine: crawler, indexer and search runtime. Development activity (measured by number of commits) has been low, mainly bug fixes and minor enhancements. There are however some nice new exciting features, currently under discussion attached to Jira. LUCY Lucy will develop a shared C-based core for ports of Lucene to other languages, such as Perl, Python and Ruby. Some small, incremental progress has been made this quarter. LUCENE.NET (incubating) Lucene.Net continues to thrive with the addition of two new committers, Doug Sale and Digy. The project is still working through making some official releases and getting organized but the community is vibrant. TIKA Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika has graduated from the Incubator and is now a Lucene subproject. On December 9th, 2008, Tika 0.2 was released under the Lucene PMC. MAHOUT Apache Mahout is a new subproject of the Lucene PMC with the goal of building a suite of scalable machine learning libraries for text and data mining. We know have Map-Reduce implementations of several clustering algorithms, 2 classification algorithms based on bayesian statistics and support for scaling fitness functions in genetic algorithms. We are working on bug fixes and documentation to get ready for a 0.1 release.