=== Lucene Status Report: 17th of September, 2008 === TLP The TLP has accepted a software grant to bring geographic search capabilities to Lucene and Solr. CRYPTOGRAPHY Nutch uses PDFBox and thus has a dependency on BouncyCastle. https://issues.apache.org/jira/browse/NUTCH-621 has been opened and is in process. Steps 1 through 3 have been completed and the Nutch team is completing step 4. LUCENE JAVA Lucene Java is a search-engine toolkit. Development has been active and we are nearing the release of 2.4. SOLR Solr is a full text search server. Development and the community is active. Shalin Shekhar Mangar was added as a committer. Solr 1.3 will be released in the next few days. NUTCH Nutch is a web-search engine: crawler, indexer and search runtime. Development activity (measured by number of commits) has been low, mainly bug fixes and minor enhancements. There are however some nice new exciting features, currently under discussion attached to Jira. LUCY Lucy will develop a shared C-based core for ports of Lucene to other languages, such as Perl, Python and Ruby. No progress has been made this quarter, but we have been in contact with the committers and they are still interested in the project and plan to be more active in the near future. LUCENE.NET (incubating) No change since last report. There is some brewing of bringing in a couple of new committers, but no official action on that yet. This project does seem to have a small community of users, with the occasional tricky question posted to the e-mail list. It's a fairly straightforward port, so several that have needed help with it have asked general questions in the java-user@lucene community. TIKA (incubating) Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika is discussing graduating from the incubator. MAHOUT Apache Mahout is a new subproject of the Lucene PMC with the goal of building a suite of scalable machine learning libraries for text and data mining. We know have Map-Reduce implementations of several clustering algorithms, 2 classification algorithms based on bayesian statistics and support for scaling fitness functions in genetic algorithms. We had 2 successful GSOC students participate over the summer. We are nearing our first, 0.1, release.