Apache Nutch 2.X is a branch of the Apache Nutch open source web-search software project. It builds on Apache Gora for data persistence and Apache Solr for indexing adding web-specifics, such as a crawler, a link-graph database and parsing support handled by Apache Tika for HTML and an array other document formats.