Special Report to Board on Lucene's "Umbrella" Status The Lucene Project has, for a long time, organized itself into several related sub projects focused on common tasks related to searching. As of our last board report (March 2010) the sub-projects were: * Lucene-Java: Our "flagship" Java search library * Nutch: A server application for crawling/indexing/search the web * Solr: A server application for indexing/searching structured data * Tika: A content extraction framework/library * Mahout: A machine learning framework/library * Open Relevance Project (ORP): A new initiative aimed at producing tools & tests for improving relevance for search engines and machine learning * Several "Ports" of Lucene-Java: Lucene.Net, PyLucene, Lucy (These are all translations, mostly automated, of the Lucene Java API) The board's chief concerns regarding umbrella projects, as communicated to the Lucene PMC by Greg, seem to focus on three main issues... 1) Can every PMC member "commit" to all of these sub-projects? 2) Is the PMC generally aware of everything going on in these sub-projects? 3) Is the PMC representing these sub-proejcts adequately to the board? The answers to these questions, in short, are: 1) Yes, and no. From the Lucene website (quotage) PMC members have karma for the all Lucene code, but as a general rule avoid making changes to sub-projects unless they have explicitly been made a committer of that sub-project by a vote of the PMC. (end quotage) 2) As a whole: Yes. Some PMC members are not directly involved in some sub-projects, but every sub-project has multiple voices on the PMC. 3) We believe so, but if the board feels like more details about the state of each sub-project needs to be included in each report, we will be happy to elaborate more in future reports. In this regard, we don't feel that there is any significant cause for concern regarding the umbrella nature of the project. To quote Greg: "we're all good. no changes are necessary." However, based on various discussions in the community (and in some cases spurred on by the Board's concerns), the Lucene PMC is pursuing some changes moving forward. The PMC has put up two resolutions for this Board meeting to spin out Mahout and Tika, both of which have solid communities that are independent of Lucene and search. The PMC has also consolidated development between Solr and Lucene such that there is now a single committer base (there was already very high overlap of code and committers) and dev mailing list across those two projects. We intend to still release both Lucene and Solr artifacts and to keep separate user question mailing lists for the foreseeable future. Finally, we have put up a resolution for Nutch to be a TLP. While it is search related, it also has a significant component related to crawling. Nutch also has a solid, independent community from Lucene, including a diverse set of committers. This leaves the following sub projects: the Ports and Open Relevance. In regards to the Ports, these fill a niche within the Lucene community and are generally small, have almost complete technical overlap (releases usually follow shortly after Lucene Java releases) but not necessarily a lot of committer overlap due to them being ports to other programming languages. Because they are nearly automated ports, there isn't a lot of contributions to them, but there are decent sized communities of users for each port. Branding wise, they make sense being a part of the Lucene TLP. Thus, the PMC doesn't feel a need to spin these out, even though they don't share SVN, etc. with Lucene core. That being said, we are still evaluating the situation. The Open Relevance sub project is small community effort to facilitate discussions on relevance in Lucene. It is nice to have it's own branding, but is made up of existing Lucene committers, so it best fits where it is. If, at some point, we start to see traction and interest from other search engines and other projects (Mahout, UIMA), it may make sense to spin it out. Finally, the Lucene PMC will carefully consider any future requests to become a sub project of Lucene. The PMC is currently sponsoring two Incubator projects: Droids and the Lucene Connectors Framework. Both of these will probably become TLPs when they graduate. In all likelihood, the only future sub projects we would take on are ports of Lucene like those mentioned above, for the reasons mentioned above.