2011-07-21 Apache Nutch Apache Nutch is an open source web crawler software project. Apache Nutch is a highly extensible and scalable open source web crawler software project. Stemming from Apache Lucene, Nutch is a well matured, production ready batch crawler relying on Apache Hadoop data structures, which are great for batch processing. Nutch has a modular architecture and provides pluggable and extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations e.g. Apache Tika for parsing. Additionally, pluggable indexers exists for Apache Solr, Elastic Search, etc. Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster. The now retired branch Nutch 2.x differed from 1.x in one key area: storage is abstracted away from any specific underlying data store by using Apache Gora for handling object to persistent mappings and to store fetch time, status, content, parsed text, outlinks, inlinks, etc. into a number of NoSQL storage solutions. Java Apache Nutch 1.18 2021-01-14 1.18 Apache Nutch 1.17 2020-06-18 1.17 Apache Nutch 1.16 2019-10-11 1.16 Apache Nutch 2.4 2019-10-11 2.4 Apache Nutch 1.15 2018-08-09 1.15 Apache Nutch 1.14 2017-12-22 1.14 Apache Nutch 1.13 2017-04-02 1.13 Apache Nutch 1.12 2016-06-18 1.12 Apache Nutch 2.3.1 2016-01-21 2.3.1 Apache Nutch 1.11 2015-12-07 1.11 Apache Nutch 1.10 2015-05-06 1.10 Apache Nutch 2.3 2015-01-22 2.3 Apache Nutch 1.9 2014-08-16 1.9 Apache Nutch 1.8 2014-03-17 1.8 Apache Nutch 2.2.1 2013-07-02 2.2.1 Apache Nutch 1.7 2013-06-24 1.7 Apache Nutch 2.2 2013-06-05 2.2 Apache Nutch 1.6 2012-12-06 1.6 Apache Nutch 2.1 2012-10-05 1.5.1 Apache Nutch 1.5.1 2012-07-10 1.5.1 Apache Nutch 2.0 2012-07-07 2.0 Apache Nutch 1.5 2012-06-07 1.5 Apache Nutch 1.4 2011-04-11 1.4 Apache Nutch 1.3 2011-06-07 1.3 branch-1.0 nutch-1.0 2009-03-23 1.0 branch-0.9 nutch-0.9 2007-04-01 0.9 branch-0.8 nutch-0.8.1 2006-09-24 0.8.1 branch-0.8 nutch-0.8 2006-06-25 0.8 branch-0.7 nutch-0.7.2 2006-03-31 0.7.2 Nutch PMC