This page tracks the project status, incubator-wise. For more general project status, look on the project website.
Apache Tika Incubation StatusThis page tracks the project status, incubator-wise. For more general project status, look on the project website. DescriptionTika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. News
Project info
Incubation status reportsOctober 2008Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007. Community
Development
Issues before graduation
July 2008Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007. Community
Development
Issues before graduation
April 2008Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007. Community
Development
Issues before graduation
January 2008Tika (http://incubator.apache.org/tika) is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser Libraries. Tika entered incubation on March 22nd, 2007. Community
Development
Issues before graduation
October 2007Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007. Community There have been a number of positive items within Tika during the last few months. The traffic on the Tika mailing list has increased significantly (with typically 2, 3 questions, and 1 or 2 commits every day, or every other day), and there have been a lot of recent inquiries from external projects wanting to collaborate with Tika (including Aperture, PDFBox and a fellow developing a JSon library currently hosted at Google code). In addition, Tika's architecture has become a recent discussion of interest (as we'll see below). We recently elected Keith Bennett as a new committer to Tika. Keith has been spearheading many of the new patches committed to Tika, as well as participating in discussions about the architecture, and future direction of the project. Tika will be represented at the "Fast Feather" track at ApacheCon US by Jukka Zitting. The rest of the community is helping to create the content for the presentation. The abstract is listed below: Tika is a new content analysis framework borne from the desire to factor our commonality from the Apache Nutch search engine framework. Tika provides a mime detection framework, an extensible parsing framework and metadata environment for content analysis. Though in its nascent stages, progress on Tika has recently taken shape and the project is nearing a stable 0.1 release. In this talk, we'll describe the core APIs of Tika and discuss its use in several distinct domains including search engines, scientific data dissemination and an industrial setting. Development There have been a flurry of JIRA issues and code activity (http://issues.apache.org/jira/browse/TIKA) including 47 issues currently in JIRA, with 32 resolved issues, 14 closed issues, and 2 open major/minor issues in progress). Tika's Parser interface (one of its key components) has just undergone a major overhaul led by Jukka Zitting, and Chris Mattmann has recently contributed a MimeType system (with help from fellow Apache Nutch committer Jerome Charron) to Tika. We also cleaned up and refactored large parts of the rest of the code (removing references to LiusLite and branding the project wherever possible with the Tika name), in preparation for an upcoming 0.1 release. Chris Mattmann has led an effort to carve out the existing MimeType detection system in Apache Nutch (http://lucene.apache.org/nutch/) and replace it with Tika's improved MimeType detection system. There is a patch sitting in JIRA right now (http://issues.apache.org/jira/browse/NUTCH-562), and barring objections, Nutch will rely on Tika for its MimeType detection abilities. Also active recently were committers Bertrand Delacretaz, Sami Siren and Rida Benjelloun, committing patches and improvements wherever needed. Issues before graduation No changes since our last report: the Tika project is still at an early stage of incubation. We need to continue bringing in the initial codebases and are targeting an initial incubating release (0.1) probably within the next month. We also need to work on growing the community and figuring out how to best interact with external parser projects. July 2007Tika is a toolkit for detecting and extracting metadata and structured text content from various document formats using existing parser libraries. Tika entered incubation on March 22nd, 2007. Community
Development
Issues before graduation
June 2007Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007. Community The Tika mailing lists have been relatively quiet lately, probably because with little code we don't yet have many concrete issues to talk about. Development We saw the first piece of Tika code when Chris A. Mattmann ported the Nutch metadata framework to Tika. Rida Benjelloun has created a version of the Lius codebase to be included in Tika, and the code is currently in the issue tracker. Issues before graduation The Tika project is still at an early stage of incubation. We need to continue bringing in the initial codebases and probably target for an initial incubating release later this year. We also need to work on growing the community and figuring out how to best interact with external parser projects. May 2007Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Incubating since: March 22nd, 2007. Community We had a good project bootstrap meeting as a part of the text analysis BOF at the ApacheCon EU in Amsterdam. The resulting ideas were summarized on the project mailing list, and the first design threads have started. Development We've started discussing the design of the Tika toolkit. It seems like we will select one of the existing codebases listed in the project proposal as the basis of an early 0.1 release, and start refactoring the code into a more generic toolkit. The Tika svn tree is still empty, but I expect us to see the first code commits before the next report. Infrastructure All the initial infrastructure is now in place. There is still some activity on the temporary Tika wiki on the Google Project hosting service, so we may end up requesting a Tika wiki to be set up on the ASF infrastructure. Issues before graduation The Tika project is still at an early stage of incubation. The most important tasks before graduation are to develop and release the Tika codebase and to grow a diverse and sustainable project community. April 2007Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007. The Tika project has just started. The basic infrastructure (mailing lists, subversion, issue tracker, web site) is mostly in place; the only thing still missing is one committer account. We expect to get started with the actual design and code work during the next few weeks. Incubation work itemsProject SetupThis is the first phase on incubation, needed to start the project at Apache. Item assignment is shown by the Apache id. Completed tasks are shown by the completion date (YYYY-MM-dd). Identify the project to be incubated
Interim responsibility
Copyright
Verify distribution rights
Establish a list of active committers
Infrastructure
Project specificSee the issue tracker. IncubationThese action items have to be checked for during the whole incubation process. These items are not to be signed as done during incubation, as they may change during incubation. They are to be looked into and described in the status reports and completed in the request for incubation signoff. Collaborative Development
Licensing awareness
Project SpecificAdd project specific tasks here. ExitThings to check for before voting the project out. Organizational acceptance of responsibility for the project
Incubator sign-off
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 The Apache Software Foundation Licensed under the Apache License, Version 2.0. |