/[Apache-SVN]/lucene/nutch/trunk/CHANGES.txt
ViewVC logotype

Log of /lucene/nutch/trunk/CHANGES.txt

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (annotate)
Sticky Revision:

Revision 884269 - (view) (annotate) - [select for diffs]
Modified Wed Nov 25 20:58:10 2009 UTC (43 hours, 32 minutes ago) by ab
File length: 46308 byte(s)
Diff to previous 884224 (colored)
NUTCH-760 Allow field mapping from nutch to solr index.

Revision 884224 - (view) (annotate) - [select for diffs]
Modified Wed Nov 25 18:08:24 2009 UTC (46 hours, 22 minutes ago) by ab
File length: 46231 byte(s)
Diff to previous 884203 (colored)
NUTCH-761 Avoid cloning CrawlDatum in CrawlDbReducer.

Revision 884203 - (view) (annotate) - [select for diffs]
Modified Wed Nov 25 17:20:33 2009 UTC (47 hours, 10 minutes ago) by ab
File length: 46155 byte(s)
Diff to previous 884198 (colored)
NUTCH-753 Prevent new Fetcher from retrieving the robots twice.

Revision 884198 - (view) (annotate) - [select for diffs]
Modified Wed Nov 25 17:10:25 2009 UTC (47 hours, 20 minutes ago) by ab
File length: 46059 byte(s)
Diff to previous 883014 (colored)
NUTCH-773 Some minor bugs in AbstractFetchSchedule.

Revision 883014 - (view) (annotate) - [select for diffs]
Modified Sat Nov 21 23:35:07 2009 UTC (5 days, 16 hours ago) by kubes
File length: 45985 byte(s)
Diff to previous 823614 (colored)
NUTCH-765 - Allow Crawl class to call Either Solr or Lucene Indexer.

Revision 823614 - (view) (annotate) - [select for diffs]
Modified Fri Oct 9 17:02:32 2009 UTC (6 weeks, 6 days ago) by ab
File length: 45906 byte(s)
Diff to previous 823600 (colored)
NUTCH-758 Set subversion eol-style to "native".

Revision 823600 - (view) (annotate) - [select for diffs]
Modified Fri Oct 9 15:56:02 2009 UTC (7 weeks ago) by ab
File length: 45829 byte(s)
Diff to previous 823557 (colored)
NUTCH-679 Fetcher2 implementing Tool.

Revision 823557 - (view) (annotate) - [select for diffs]
Modified Fri Oct 9 14:05:05 2009 UTC (7 weeks ago) by ab
File length: 45764 byte(s)
Diff to previous 823553 (colored)
NUTCH-756 CrawlDatum.set() does not reset Metadata if it is null.

Revision 823553 - (view) (annotate) - [select for diffs]
Modified Fri Oct 9 13:54:27 2009 UTC (7 weeks ago) by ab
File length: 45669 byte(s)
Diff to previous 823547 (colored)
NUTCH-754 Use GenericOptionsParser instead of FileSystem.parseArgs().

Revision 823547 - (view) (annotate) - [select for diffs]
Modified Fri Oct 9 13:29:01 2009 UTC (7 weeks ago) by ab
File length: 45570 byte(s)
Diff to previous 823540 (colored)
NUTCH-757 RequestUtils getBooleanParameter() always returns false.

Revision 823540 - (view) (annotate) - [select for diffs]
Modified Fri Oct 9 13:11:15 2009 UTC (7 weeks ago) by ab
File length: 45472 byte(s)
Diff to previous 823532 (colored)
NUTCH-731 Redirection of robots.txt in RobotRulesParser.

Revision 823532 - (view) (annotate) - [select for diffs]
Modified Fri Oct 9 12:53:27 2009 UTC (7 weeks ago) by ab
File length: 45388 byte(s)
Diff to previous 823531 (colored)
NUTCH-730 NPE in LinkRank if no nodes with which to create the WebGraph.

Revision 823531 - (view) (annotate) - [select for diffs]
Modified Fri Oct 9 12:43:44 2009 UTC (7 weeks ago) by ab
File length: 45287 byte(s)
Diff to previous 812497 (colored)
NUTCH-707 Generation of multiple segments in multiple runs returns only 1 segment.

Revision 812497 - (view) (annotate) - [select for diffs]
Modified Tue Sep 8 13:15:03 2009 UTC (2 months, 2 weeks ago) by dogacan
File length: 45187 byte(s)
Diff to previous 807485 (colored)
NUTCH-702 - Lazy Instanciation of Metadata in CrawlDatum. Contributed by Julien Nioche.


Revision 807485 - (view) (annotate) - [select for diffs]
Modified Tue Aug 25 05:45:53 2009 UTC (3 months ago) by dogacan
File length: 45097 byte(s)
Diff to previous 782412 (colored)
Fetcher2 slow. Patch contributed by Julien Nioche.

Revision 782412 - (view) (annotate) - [select for diffs]
Modified Sun Jun 7 17:12:18 2009 UTC (5 months, 2 weeks ago) by dogacan
File length: 45038 byte(s)
Diff to previous 757500 (colored)
NUTCH-735 - crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command. Patch by Susam Pal.

Revision 757500 - (view) (annotate) - [select for diffs]
Modified Mon Mar 23 18:59:26 2009 UTC (8 months ago) by siren
File length: 44890 byte(s)
Diff to previous 757327 (colored)
update release date

Revision 757327 - (view) (annotate) - [select for diffs]
Modified Mon Mar 23 06:41:13 2009 UTC (8 months ago) by siren
File length: 44890 byte(s)
Diff to previous 752001 (colored)
NUTCH-722 remove JAI libs

Revision 752001 - (view) (annotate) - [select for diffs]
Modified Tue Mar 10 07:08:29 2009 UTC (8 months, 2 weeks ago) by siren
File length: 44815 byte(s)
Diff to previous 752000 (colored)
prepare for release

Revision 752000 - (view) (annotate) - [select for diffs]
Modified Tue Mar 10 07:07:22 2009 UTC (8 months, 2 weeks ago) by siren
File length: 44815 byte(s)
Diff to previous 751774 (colored)
NUTCH-715 - Subcollection plugin doesn't work with default subcollections.xml file. Contributed by Dmitry Lihachev

Revision 751774 - (view) (annotate) - [select for diffs]
Modified Mon Mar 9 17:34:51 2009 UTC (8 months, 2 weeks ago) by dogacan
File length: 44693 byte(s)
Diff to previous 751475 (colored)
NUTCH-684 - Dedup support for Solr

Revision 751475 - (view) (annotate) - [select for diffs]
Modified Sun Mar 8 17:30:52 2009 UTC (8 months, 2 weeks ago) by siren
File length: 44642 byte(s)
Diff to previous 751471 (colored)
the version is indeed 1.0

Revision 751471 - (view) (annotate) - [select for diffs]
Modified Sun Mar 8 17:20:59 2009 UTC (8 months, 2 weeks ago) by siren
File length: 44642 byte(s)
Diff to previous 750037 (colored)
preparing for release

Revision 750037 - (view) (annotate) - [select for diffs]
Modified Wed Mar 4 15:02:29 2009 UTC (8 months, 3 weeks ago) by ab
File length: 44646 byte(s)
Diff to previous 749289 (colored)
NUTCH-711 - Indexer failing after upgrade to Hadoop 0.19.1. This is a temporary
fix, to be revisited later.

Revision 749289 - (view) (annotate) - [select for diffs]
Modified Mon Mar 2 12:28:22 2009 UTC (8 months, 3 weeks ago) by siren
File length: 44576 byte(s)
Diff to previous 749256 (colored)
NUTCH-669 - Consolidate code for Fetcher and Fetcher2

Revision 749256 - (view) (annotate) - [select for diffs]
Modified Mon Mar 2 10:16:51 2009 UTC (8 months, 3 weeks ago) by siren
File length: 44508 byte(s)
Diff to previous 749249 (colored)
NUTCH-700 - revert to nekohtml-0.9.4

Revision 749249 - (view) (annotate) - [select for diffs]
Modified Mon Mar 2 09:12:51 2009 UTC (8 months, 3 weeks ago) by ab
File length: 44434 byte(s)
Diff to previous 748637 (colored)
Commit changes to CHANGES.

Revision 748637 - (view) (annotate) - [select for diffs]
Modified Fri Feb 27 18:54:24 2009 UTC (8 months, 4 weeks ago) by ab
File length: 44340 byte(s)
Diff to previous 748408 (colored)
NUTCH-703 Upgrade to Hadoop 0.19.1.

Revision 748408 - (view) (annotate) - [select for diffs]
Modified Fri Feb 27 06:21:37 2009 UTC (8 months, 4 weeks ago) by siren
File length: 44291 byte(s)
Diff to previous 747324 (colored)
NUTCH-699 - Add an "official" solr schema for solr integration. Contributed by dogacan, Dmitry Lihachev

Revision 747324 - (view) (annotate) - [select for diffs]
Modified Tue Feb 24 10:09:36 2009 UTC (9 months ago) by siren
File length: 44175 byte(s)
Diff to previous 747319 (colored)
NUTCH-698 - CrawlDb is corrupted after a few crawl cycles, contributed by dogacan

Revision 747319 - (view) (annotate) - [select for diffs]
Modified Tue Feb 24 09:54:30 2009 UTC (9 months ago) by siren
File length: 44086 byte(s)
Diff to previous 747312 (colored)
NUTCH-247 - Robot parser to restrict, contributed by kubes

Revision 747312 - (view) (annotate) - [select for diffs]
Modified Tue Feb 24 09:18:03 2009 UTC (9 months ago) by siren
File length: 44028 byte(s)
Diff to previous 746900 (colored)
NUTCH-626 - Fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects, contributed by Remco Verhoef, dogacan

Revision 746900 - (view) (annotate) - [select for diffs]
Modified Mon Feb 23 07:02:30 2009 UTC (9 months ago) by siren
File length: 43879 byte(s)
Diff to previous 745808 (colored)
NUTCH-694 - Distributed Search Server fails

Revision 745808 - (view) (annotate) - [select for diffs]
Modified Thu Feb 19 10:25:47 2009 UTC (9 months, 1 week ago) by siren
File length: 43816 byte(s)
Diff to previous 745503 (colored)
NUTCH-695 - incorrect mime type detection by MoreIndexingFilter plugin, contributed by Dmitry Lihachev

Revision 745503 - (view) (annotate) - [select for diffs]
Modified Wed Feb 18 12:53:12 2009 UTC (9 months, 1 week ago) by siren
File length: 43701 byte(s)
Diff to previous 745499 (colored)
NUTCH-563 Include custom fields in BasicQueryFilter, contributed by Julien Nioche

Revision 745499 - (view) (annotate) - [select for diffs]
Modified Wed Feb 18 12:43:04 2009 UTC (9 months, 1 week ago) by siren
File length: 43610 byte(s)
Diff to previous 745096 (colored)
NUTCH-691 - Update jakarta poi jars to the most relevant version, contributed by Dmitry Lihachev

Revision 745096 - (view) (annotate) - [select for diffs]
Modified Tue Feb 17 14:28:14 2009 UTC (9 months, 1 week ago) by siren
File length: 43506 byte(s)
Diff to previous 743277 (colored)
fix NUTCH-631 - thanks to Stefan Will

Revision 743277 - (view) (annotate) - [select for diffs]
Modified Wed Feb 11 09:12:15 2009 UTC (9 months, 2 weeks ago) by dogacan
File length: 43409 byte(s)
Diff to previous 741559 (colored)
NUTCH-683 - NUTCH-676 broke CrawlDbMerger

Revision 741559 - (view) (annotate) - [select for diffs]
Modified Fri Feb 6 13:17:08 2009 UTC (9 months, 2 weeks ago) by ab
File length: 43351 byte(s)
Diff to previous 741558 (colored)
NUTCH-636 Httpclient plugin https doesn't work on IBM JRE.

Revision 741558 - (view) (annotate) - [select for diffs]
Modified Fri Feb 6 13:09:07 2009 UTC (9 months, 2 weeks ago) by ab
File length: 43255 byte(s)
Diff to previous 740324 (colored)
NUTCH-643 ClassCastException in PDF parser, upgrade to unofficial PDFBox 0.7.4

Revision 740324 - (view) (annotate) - [select for diffs]
Modified Tue Feb 3 15:43:57 2009 UTC (9 months, 3 weeks ago) by ab
File length: 43183 byte(s)
Diff to previous 740318 (colored)
NUTCH-671 - JSP errors in Nutch searcher webapp.

Revision 740318 - (view) (annotate) - [select for diffs]
Modified Tue Feb 3 15:12:48 2009 UTC (9 months, 3 weeks ago) by ab
File length: 43110 byte(s)
Diff to previous 738970 (colored)
NUTCH-279 Additions to urlnormalizer-regex (modified).

Revision 738970 - (view) (annotate) - [select for diffs]
Modified Thu Jan 29 19:12:08 2009 UTC (9 months, 4 weeks ago) by dogacan
File length: 43036 byte(s)
Diff to previous 738455 (colored)
NUTCH-682 - SOLR indexer does not set boost on the document. Patch by julien nioche

Revision 738455 - (view) (annotate) - [select for diffs]
Modified Wed Jan 28 11:33:20 2009 UTC (9 months, 4 weeks ago) by dogacan
File length: 42924 byte(s)
Diff to previous 736388 (colored)
NUTCH-571 - parse-mp3 plugin doesn't always index album of mp3. Patch
by Joseph Chen.

Revision 736388 - (view) (annotate) - [select for diffs]
Modified Wed Jan 21 19:41:55 2009 UTC (10 months ago) by dogacan
File length: 42814 byte(s)
Diff to previous 736385 (colored)
NUTCH-579 - Feed plugin only indexes one post per feed due to identical digest

Revision 736385 - (view) (annotate) - [select for diffs]
Modified Wed Jan 21 19:26:27 2009 UTC (10 months ago) by dogacan
File length: 42701 byte(s)
Diff to previous 736307 (colored)
NUTCH-676 - MapWritable is written inefficiently and confusingly.

Revision 736307 - (view) (annotate) - [select for diffs]
Modified Wed Jan 21 13:09:48 2009 UTC (10 months ago) by dogacan
File length: 42602 byte(s)
Diff to previous 735748 (colored)
NUTCH-681 - parse-mp3 compilation problem. Patch by Wildan Maulana.

Revision 735748 - (view) (annotate) - [select for diffs]
Modified Mon Jan 19 17:09:47 2009 UTC (10 months, 1 week ago) by dogacan
File length: 42506 byte(s)
Diff to previous 734257 (colored)
NUTCH-678 - Hadoop 0.19 requires an update of jets3t (julien nioche)

Revision 734257 - (view) (annotate) - [select for diffs]
Modified Tue Jan 13 22:15:58 2009 UTC (10 months, 1 week ago) by otis
File length: 42401 byte(s)
Diff to previous 733747 (colored)
NUTCH-627 - Minimize host address lookup while running generate

Revision 733747 - (view) (annotate) - [select for diffs]
Modified Mon Jan 12 13:37:23 2009 UTC (10 months, 2 weeks ago) by dogacan
File length: 42335 byte(s)
Diff to previous 733738 (colored)
NUTCH-652 - AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch interval correctly

Revision 733738 - (view) (annotate) - [select for diffs]
Modified Mon Jan 12 13:26:16 2009 UTC (10 months, 2 weeks ago) by dogacan
File length: 42207 byte(s)
Diff to previous 730845 (colored)
NUTCH-442 - Integrate Solr/Nutch

Revision 730845 - (view) (annotate) - [select for diffs]
Modified Fri Jan 2 21:38:58 2009 UTC (10 months, 3 weeks ago) by kubes
File length: 42129 byte(s)
Diff to previous 729958 (colored)
NUTCH-594: Serve Nutch search results in multiple formats including XML and JSON.

Revision 729958 - (view) (annotate) - [select for diffs]
Modified Mon Dec 29 17:58:12 2008 UTC (10 months, 4 weeks ago) by kubes
File length: 42012 byte(s)
Diff to previous 723449 (colored)
NUTCH-668: Domain URL Filter plugin

Revision 723449 - (view) (annotate) - [select for diffs]
Modified Thu Dec 4 21:31:58 2008 UTC (11 months, 3 weeks ago) by kubes
File length: 41966 byte(s)
Diff to previous 723441 (colored)
NUTCH-646: New Indexing Framework for Nutch.

Revision 723441 - (view) (annotate) - [select for diffs]
Modified Thu Dec 4 21:16:42 2008 UTC (11 months, 3 weeks ago) by kubes
File length: 41905 byte(s)
Diff to previous 722483 (colored)
NUTCH-635: LinkAnalysis Tool for Nutch.

Revision 722483 - (view) (annotate) - [select for diffs]
Modified Tue Dec 2 14:59:21 2008 UTC (11 months, 3 weeks ago) by kubes
File length: 41824 byte(s)
Diff to previous 722481 (colored)
NUTCH-667: Input Format for working with Content in Hadoop Streaming

Revision 722481 - (view) (annotate) - [select for diffs]
Modified Tue Dec 2 14:55:38 2008 UTC (11 months, 3 weeks ago) by kubes
File length: 41748 byte(s)
Diff to previous 722480 (colored)
NUTCH-665: Search Load Testing Tool

Revision 722480 - (view) (annotate) - [select for diffs]
Modified Tue Dec 2 14:52:12 2008 UTC (11 months, 3 weeks ago) by kubes
File length: 41697 byte(s)
Diff to previous 722477 (colored)
NUTCH-647: Resolve URLs tool

Revision 722477 - (view) (annotate) - [select for diffs]
Modified Tue Dec 2 14:47:35 2008 UTC (11 months, 3 weeks ago) by kubes
File length: 41653 byte(s)
Diff to previous 722475 (colored)
NUTCH-663: Upgrade Nutch to use Hadoop 0.19

Revision 722475 - (view) (annotate) - [select for diffs]
Modified Tue Dec 2 14:41:09 2008 UTC (11 months, 3 weeks ago) by kubes
File length: 41594 byte(s)
Diff to previous 701052 (colored)
NUTCH-662: Upgrade Nutch to use Lucene 2.4

Revision 701052 - (view) (annotate) - [select for diffs]
Modified Thu Oct 2 09:17:23 2008 UTC (13 months, 3 weeks ago) by dogacan
File length: 41525 byte(s)
Diff to previous 701045 (colored)
NUTCH-640 - confusing description "set it to Integer.MAX_VALUE"

Revision 701045 - (view) (annotate) - [select for diffs]
Modified Thu Oct 2 09:05:22 2008 UTC (13 months, 3 weeks ago) by dogacan
File length: 41439 byte(s)
Diff to previous 699866 (colored)
NUTCH-654 - urlfilter-regex's main does not work

Revision 699866 - (view) (annotate) - [select for diffs]
Modified Sun Sep 28 17:24:23 2008 UTC (13 months, 4 weeks ago) by mattmann
File length: 41368 byte(s)
Diff to previous 698471 (colored)
- NUTCH-621

Revision 698471 - (view) (annotate) - [select for diffs]
Modified Wed Sep 24 08:52:19 2008 UTC (14 months ago) by dogacan
File length: 41298 byte(s)
Diff to previous 697896 (colored)
NUTCH-653 - Upgrade to hadoop 0.18

Revision 697896 - (view) (annotate) - [select for diffs]
Modified Mon Sep 22 16:43:33 2008 UTC (14 months ago) by dogacan
File length: 41246 byte(s)
Diff to previous 697878 (colored)
NUTCH-633 - ParseSegment no longer allow reparsing.

Revision 697878 - (view) (annotate) - [select for diffs]
Modified Mon Sep 22 16:02:40 2008 UTC (14 months ago) by ab
File length: 41173 byte(s)
Diff to previous 697781 (colored)
NUTCH-375 - Add support for Content-Encoding: deflate.

Revision 697781 - (view) (annotate) - [select for diffs]
Modified Mon Sep 22 11:08:09 2008 UTC (14 months ago) by dogacan
File length: 41091 byte(s)
Diff to previous 697395 (colored)
NUTCH-651 - Remove bin/{start|stop}-balancer.sh from svn tracking

Revision 697395 - (view) (annotate) - [select for diffs]
Modified Sat Sep 20 17:05:03 2008 UTC (14 months, 1 week ago) by dogacan
File length: 41005 byte(s)
Diff to previous 686912 (colored)
NUTCH-639 - Change LuceneDocumentWrapper visibility from private to protected

Revision 686912 - (view) (annotate) - [select for diffs]
Modified Tue Aug 19 00:49:45 2008 UTC (15 months, 1 week ago) by ab
File length: 40891 byte(s)
Diff to previous 686910 (colored)
NUTCH-642 - Unit tests fail when run in non-local mode.

Revision 686910 - (view) (annotate) - [select for diffs]
Modified Tue Aug 19 00:42:07 2008 UTC (15 months, 1 week ago) by ab
File length: 40826 byte(s)
Diff to previous 686900 (colored)
NUTCH-645 Parse-swf unit test failing - fix.

Revision 686900 - (view) (annotate) - [select for diffs]
Modified Mon Aug 18 23:56:20 2008 UTC (15 months, 1 week ago) by ab
File length: 40776 byte(s)
Diff to previous 678533 (colored)
NUTCH-641 IndexSorter incorrectly copies stored fields.

Revision 678533 - (view) (annotate) - [select for diffs]
Modified Mon Jul 21 19:20:21 2008 UTC (16 months, 1 week ago) by ab
File length: 40709 byte(s)
Diff to previous 663092 (colored)
NUTCH-634 Upgrade Nutch to Hadoop 0.17.1 .

Revision 663092 - (view) (annotate) - [select for diffs]
Modified Wed Jun 4 13:40:19 2008 UTC (17 months, 3 weeks ago) by mattmann
File length: 40617 byte(s)
Diff to previous 649652 (colored)
- fix for NUTCH-618

Revision 649652 - (view) (annotate) - [select for diffs]
Modified Fri Apr 18 18:52:38 2008 UTC (19 months, 1 week ago) by dogacan
File length: 40534 byte(s)
Diff to previous 646436 (colored)
NUTCH-596 - ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS.

Revision 646436 - (view) (annotate) - [select for diffs]
Modified Wed Apr 9 16:57:41 2008 UTC (19 months, 2 weeks ago) by kubes
File length: 40428 byte(s)
Diff to previous 638782 (colored)
NUTCH-500 - Add hadoop masters configuration file into conf folder.  Thanks Emmanuel.

Revision 638782 - (view) (annotate) - [select for diffs]
Modified Wed Mar 19 10:45:55 2008 UTC (20 months, 1 week ago) by ab
File length: 40324 byte(s)
Diff to previous 638779 (colored)
NUTCH-620 BasicURLNormalizer should collapse runs of slashes with a single slash.

Revision 638779 - (view) (annotate) - [select for diffs]
Modified Wed Mar 19 10:34:14 2008 UTC (20 months, 1 week ago) by ab
File length: 40209 byte(s)
Diff to previous 637967 (colored)
NUTCH-598 - Remove deprecated use of ToolBase. Use generics in Hadoop API.

Revision 637967 - (view) (annotate) - [select for diffs]
Modified Mon Mar 17 16:44:29 2008 UTC (20 months, 1 week ago) by ab
File length: 40096 byte(s)
Diff to previous 637960 (colored)
NUTCH-223 Crawl.java uses Integer.MAX_VALUE instead of Long.MAX_VALUE.

Revision 637960 - (view) (annotate) - [select for diffs]
Modified Mon Mar 17 16:23:56 2008 UTC (20 months, 1 week ago) by ab
File length: 40022 byte(s)
Diff to previous 637861 (colored)
NUTCH-220 Upgrade to PDFBox 0.7.3.

Revision 637861 - (view) (annotate) - [select for diffs]
Modified Mon Mar 17 12:42:54 2008 UTC (20 months, 1 week ago) by ab
File length: 39976 byte(s)
Diff to previous 637858 (colored)
NUTCH-616 Reset Fetch Retry counter when fetch is successful.

Revision 637858 - (view) (annotate) - [select for diffs]
Modified Mon Mar 17 12:33:56 2008 UTC (20 months, 1 week ago) by ab
File length: 39884 byte(s)
Diff to previous 637308 (colored)
NUTCH-615 Redirected URL-s fetched without setting fetchInterval. Guard against
reprUrl being null.

Revision 637308 - (view) (annotate) - [select for diffs]
Modified Sat Mar 15 00:17:07 2008 UTC (20 months, 2 weeks ago) by ab
File length: 39753 byte(s)
Diff to previous 637127 (colored)
NUTCH-126 Fetching via https doesn't work with a proxy.

Revision 637127 - (view) (annotate) - [select for diffs]
Modified Fri Mar 14 15:10:55 2008 UTC (20 months, 2 weeks ago) by ab
File length: 39672 byte(s)
Diff to previous 637122 (colored)
NUTCH-575 NPE in OpenSearchServlet.

Revision 637122 - (view) (annotate) - [select for diffs]
Modified Fri Mar 14 14:54:31 2008 UTC (20 months, 2 weeks ago) by ab
File length: 39610 byte(s)
Diff to previous 637114 (colored)
NUTCH-601 Recrawling in existing crawl directory.

Revision 637114 - (view) (annotate) - [select for diffs]
Modified Fri Mar 14 14:33:53 2008 UTC (20 months, 2 weeks ago) by ab
File length: 39535 byte(s)
Diff to previous 637105 (colored)
NUTCH-612 URL filtering was disabled when invoking Generator from Crawl.

Revision 637105 - (view) (annotate) - [select for diffs]
Modified Fri Mar 14 14:12:31 2008 UTC (20 months, 2 weeks ago) by ab
File length: 39431 byte(s)
Diff to previous 630779 (colored)
NUTCH-613 Empty summaries and cached pages.

Revision 630779 - (view) (annotate) - [select for diffs]
Modified Mon Feb 25 09:38:12 2008 UTC (21 months ago) by dogacan
File length: 39365 byte(s)
Diff to previous 628631 (colored)
NUTCH-567 - Proper (?) handling of URIs in TagSoup.

Revision 628631 - (view) (annotate) - [select for diffs]
Modified Mon Feb 18 06:38:46 2008 UTC (21 months, 1 week ago) by kubes
File length: 39251 byte(s)
Diff to previous 627893 (colored)
NUTCH-44 - Too many search results.  Configurable limit on max number of search results returned.  Thanks Emilijan Mirceski and Susam Pal.

Revision 627893 - (view) (annotate) - [select for diffs]
Modified Thu Feb 14 22:21:50 2008 UTC (21 months, 1 week ago) by kubes
File length: 39111 byte(s)
Diff to previous 627890 (colored)
NUTCH-611 - Upgrade Nutch to use Hadoop 0.16.  This upgrade removes the deprecated addDefaultResouce and addFinalResource methods.  Should now use addResource.  Two scripts start-balancer.sh and stop-balancer.sh are added to the bin directory.

Revision 627890 - (view) (annotate) - [select for diffs]
Modified Thu Feb 14 22:17:28 2008 UTC (21 months, 1 week ago) by kubes
File length: 39053 byte(s)
Diff to previous 620818 (colored)
NUTCH-603 - Add more default url normalizations.

Revision 620818 - (view) (annotate) - [select for diffs]
Modified Tue Feb 12 14:54:42 2008 UTC (21 months, 2 weeks ago) by kubes
File length: 38992 byte(s)
Diff to previous 620817 (colored)
NUTCH-605 - Change deprecated configuration methods for Hadoop.

Revision 620817 - (view) (annotate) - [select for diffs]
Modified Tue Feb 12 14:51:33 2008 UTC (21 months, 2 weeks ago) by kubes
File length: 38916 byte(s)
Diff to previous 620811 (colored)
NUTCH-606 - Refactoring of Generator, run all urls through checks.

Revision 620811 - (view) (annotate) - [select for diffs]
Modified Tue Feb 12 14:08:50 2008 UTC (21 months, 2 weeks ago) by mattmann
File length: 38837 byte(s)
Diff to previous 620172 (colored)
- fix for NUTCH-608

Revision 620172 - (view) (annotate) - [select for diffs]
Modified Sat Feb 9 18:41:19 2008 UTC (21 months, 2 weeks ago) by kubes
File length: 38752 byte(s)
Diff to previous 619648 (colored)
NUTCH-607 - Update build.xml to include tika jar in war when building the war file.

Revision 619648 - (view) (annotate) - [select for diffs]
Modified Thu Feb 7 21:32:06 2008 UTC (21 months, 2 weeks ago) by kubes
File length: 38676 byte(s)
Diff to previous 618975 (colored)
NUTCH-602 - Allow configurable number of handlers for search servers.  Thanks to Seth Hartbecke from Search Wikia for spotting this.

Revision 618975 - (view) (annotate) - [select for diffs]
Modified Wed Feb 6 12:06:34 2008 UTC (21 months, 3 weeks ago) by ab
File length: 38576 byte(s)
Diff to previous 616095 (colored)
NUTCH-604 Upgrade to Lucene 2.3.0.

Revision 616095 - (view) (annotate) - [select for diffs]
Modified Mon Jan 28 22:40:29 2008 UTC (21 months, 4 weeks ago) by kubes
File length: 38530 byte(s)
Diff to previous 616093 (colored)
NUTCH-587 - Upgrade Nutch to use Hadoop 0.15.3 release.  Goof on changes.txt, didn't change the number.  Changed it to 68.

Revision 616093 - (view) (annotate) - [select for diffs]
Modified Mon Jan 28 22:36:04 2008 UTC (21 months, 4 weeks ago) by kubes
File length: 38530 byte(s)
Diff to previous 613378 (colored)
NUTCH-587 - Upgrade Nutch to use Hadoop 0.15.3 release.  

Revision 613378 - (view) (annotate) - [select for diffs]
Modified Sat Jan 19 08:59:29 2008 UTC (22 months, 1 week ago) by siren
File length: 38481 byte(s)
Diff to previous 612505 (colored)
NUTCH-580 Remove deprecated hadoop api calls (FS)

Revision 612505 - (view) (annotate) - [select for diffs]
Modified Wed Jan 16 16:51:19 2008 UTC (22 months, 1 week ago) by ab
File length: 38416 byte(s)
Diff to previous 612264 (colored)
NUTCH-584 - urls missing from fetchlists.

Revision 612264 - (view) (annotate) - [select for diffs]
Modified Tue Jan 15 22:38:47 2008 UTC (22 months, 1 week ago) by ab
File length: 38350 byte(s)
Diff to previous 612245 (colored)
NUTCH-597 - NPE in Fetcher2 when redirecting.

Revision 612245 - (view) (annotate) - [select for diffs]
Modified Tue Jan 15 22:02:52 2008 UTC (22 months, 1 week ago) by ab
File length: 38294 byte(s)
Diff to previous 612174 (colored)
CrawlDbReader: add some new stats + dump into a csv format

Revision 612174 - (view) (annotate) - [select for diffs]
Modified Tue Jan 15 17:54:10 2008 UTC (22 months, 1 week ago) by ab
File length: 38191 byte(s)
Diff to previous 608972 (colored)
NUTCH-534 SegmentMerger: add -normalize option.

Revision 608972 - (view) (annotate) - [select for diffs]
Modified Fri Jan 4 19:48:32 2008 UTC (22 months, 3 weeks ago) by dogacan
File length: 38113 byte(s)
Diff to previous 604956 (colored)
NUTCH-559 - NTLM, Basic and Digest Authentication schemes for web/proxy. Contributed by Susam Pal.

Revision 604956 - (view) (annotate) - [select for diffs]
Modified Mon Dec 17 18:22:17 2007 UTC (23 months, 1 week ago) by ab
File length: 38000 byte(s)
Diff to previous 601043 (colored)
NUTCH-586 - Add option to run compiled classes without job file.

Revision 601043 - (view) (annotate) - [select for diffs]
Modified Tue Dec 4 19:13:28 2007 UTC (23 months, 3 weeks ago) by kubes
File length: 37913 byte(s)
Diff to previous 594591 (colored)
NUTCH-581 - DistributedSearch does not update search servers added to search-servers.txt on the fly.  This allows search servers to be added and removed on the fly.  Thanks Rohan.

Revision 594591 - (view) (annotate) - [select for diffs]
Modified Tue Nov 13 17:35:08 2007 UTC (2 years ago) by kubes
File length: 37777 byte(s)
Diff to previous 593263 (colored)
NUTCH-574 - Including inlink anchor text in index can create irrelevant search results.  Moved inbound anchor text indexing from index-basic to new index-anchor plugin.  For backwards compatibility index-anchor will need to be added to the nutch-site.xml plugin.includes configuration variable. 

Revision 593263 - (view) (annotate) - [select for diffs]
Modified Thu Nov 8 19:13:37 2007 UTC (2 years ago) by dogacan
File length: 37496 byte(s)
Diff to previous 593261 (colored)
NUTCH-494 - FindBugs: CrawlDbReader and DeleteDuplicates.

Revision 593261 - (view) (annotate) - [select for diffs]
Modified Thu Nov 8 19:09:06 2007 UTC (2 years ago) by dogacan
File length: 37423 byte(s)
Diff to previous 593186 (colored)
NUTCH-538 - Delete unused classes under o.a.n.util.

Revision 593186 - (view) (annotate) - [select for diffs]
Modified Thu Nov 8 15:08:47 2007 UTC (2 years ago) by dogacan
File length: 37356 byte(s)
Diff to previous 593151 (colored)
NUTCH-548 - Move URLNormalizer from Outlink to ParseOutputFormat. Contributed by Emmanuel Joke.

Revision 593151 - (view) (annotate) - [select for diffs]
Modified Thu Nov 8 13:18:05 2007 UTC (2 years ago) by dogacan
File length: 37253 byte(s)
Diff to previous 591793 (colored)
NUTCH-547 - Redirection handling: YahooSlurp's algorithm.

Revision 591793 - (view) (annotate) - [select for diffs]
Modified Sun Nov 4 16:01:53 2007 UTC (2 years ago) by kubes
File length: 37157 byte(s)
Diff to previous 591791 (colored)
NUTCH-565 - Arc File to Nutch Segments Converter.  This tools allows the conversion of multiple .arc files, a format used by the internet archive and grub distributed crawler projects, into Nutch segments.

Revision 591791 - (view) (annotate) - [select for diffs]
Modified Sun Nov 4 15:38:35 2007 UTC (2 years ago) by kubes
File length: 37094 byte(s)
Diff to previous 589654 (colored)
NUTCH-552 - Upgrade Nutch to Hadoop 0.15.x.

Revision 589654 - (view) (annotate) - [select for diffs]
Modified Mon Oct 29 14:57:19 2007 UTC (2 years ago) by dogacan
File length: 37037 byte(s)
Diff to previous 586032 (colored)
NUTCH-501 - Implement a different caching mechanism for objects cached in configuration.

Revision 586032 - (view) (annotate) - [select for diffs]
Modified Thu Oct 18 16:53:48 2007 UTC (2 years, 1 month ago) by kubes
File length: 36928 byte(s)
Diff to previous 583016 (colored)
NUTCH-488 - Avoid parsing uneccessary links and get a more relevant outlink list.  Thanks to Marcin Okraszewski and Emmanuel Joke.

Revision 583016 - (view) (annotate) - [select for diffs]
Modified Tue Oct 9 00:23:38 2007 UTC (2 years, 1 month ago) by mattmann
File length: 36786 byte(s)
Diff to previous 582775 (colored)
- fix for NUTCH-562

Revision 582775 - (view) (annotate) - [select for diffs]
Modified Mon Oct 8 10:58:11 2007 UTC (2 years, 1 month ago) by dogacan
File length: 36687 byte(s)
Diff to previous 579656 (colored)
NUTCH-508 - ${hadoop.log.dir} and ${hadoop.log.file} are not propagated to the tasktracker. Contributed by Mathijs Homminga and Emmanuel Joke.

Revision 579656 - (view) (annotate) - [select for diffs]
Modified Wed Sep 26 14:02:48 2007 UTC (2 years, 2 months ago) by dogacan
File length: 36540 byte(s)
Diff to previous 578703 (colored)
NUTCH-25 - needs 'character encoding' detector. Mostly contributed by Doug Cook. Some parts are contributed by Marcin Okraszewski and Renaud Richardet. Also fixes NUTCH-369 and NUTCH-487.

Revision 578703 - (view) (annotate) - [select for diffs]
Modified Mon Sep 24 08:27:34 2007 UTC (2 years, 2 months ago) by dogacan
File length: 36412 byte(s)
Diff to previous 577018 (colored)
NUTCH-529 - NodeWalker.skipChildren doesn't work for more than 1 child. Contributed by Emmanuel Joke.

Revision 577018 - (view) (annotate) - [select for diffs]
Modified Tue Sep 18 19:07:39 2007 UTC (2 years, 2 months ago) by ab
File length: 36303 byte(s)
Diff to previous 574346 (colored)
NUTCH-554 - Generator throws IOException on invalid urls.

Revision 574346 - (view) (annotate) - [select for diffs]
Modified Mon Sep 10 19:45:22 2007 UTC (2 years, 2 months ago) by dogacan
File length: 36213 byte(s)
Diff to previous 574344 (colored)
NUTCH-546 - file URL are filtered out by the crawler.

Revision 574344 - (view) (annotate) - [select for diffs]
Modified Mon Sep 10 19:40:20 2007 UTC (2 years, 2 months ago) by dogacan
File length: 36144 byte(s)
Diff to previous 572335 (colored)
NUTCH-550 - Parse fails if db.max.outlinks.per.page is -1.

Revision 572335 - (view) (annotate) - [select for diffs]
Modified Mon Sep 3 13:37:24 2007 UTC (2 years, 2 months ago) by dogacan
File length: 36070 byte(s)
Diff to previous 570331 (colored)
NUTCH-532 - CrawlDbMerger: wrong computation of last fetch time. Contributed by Emmanuel Joke.

Revision 570331 - (view) (annotate) - [select for diffs]
Modified Tue Aug 28 06:34:36 2007 UTC (2 years, 3 months ago) by dogacan
File length: 35967 byte(s)
Diff to previous 570327 (colored)
NUTCH-545 - Configuration and OnlineClusterer get initialized in every request. Contributed by Dawid Weiss.

Revision 570327 - (view) (annotate) - [select for diffs]
Modified Tue Aug 28 06:26:51 2007 UTC (2 years, 3 months ago) by dogacan
File length: 35852 byte(s)
Diff to previous 568053 (colored)
NUTCH-544 - Upgrade Carrot2 clustering plugin to the newest stable release (2.1). Contributed by Dawid Weiss.

Revision 568053 - (view) (annotate) - [select for diffs]
Modified Tue Aug 21 10:50:07 2007 UTC (2 years, 3 months ago) by dogacan
File length: 35734 byte(s)
Diff to previous 563894 (colored)
NUTCH-439 - Top Level Domains Indexing / Scoring. Contributed by Enis.

Revision 563894 - (view) (annotate) - [select for diffs]
Modified Wed Aug 8 14:23:25 2007 UTC (2 years, 3 months ago) by dogacan
File length: 35610 byte(s)
Diff to previous 563807 (colored)
NUTCH-536 - Reduce number of warnings in nutch core.

Revision 563807 - (view) (annotate) - [select for diffs]
Modified Wed Aug 8 10:57:11 2007 UTC (2 years, 3 months ago) by dogacan
File length: 35542 byte(s)
Diff to previous 563777 (colored)
NUTCH-522 - Use URLValidator in the Injector.

Revision 563777 - (view) (annotate) - [select for diffs]
Modified Wed Aug 8 07:33:23 2007 UTC (2 years, 3 months ago) by dogacan
File length: 35466 byte(s)
Diff to previous 561306 (colored)
NUTCH-535 - ParseData's contentMeta accumulates unnecessary values during parse.

Revision 561306 - (view) (annotate) - [select for diffs]
Modified Tue Jul 31 12:07:30 2007 UTC (2 years, 3 months ago) by dogacan
File length: 35366 byte(s)
Diff to previous 561092 (colored)
NUTCH-533 - LinkDbMerger: url normalized is not updated in the key and inlinks list. Contributed by Emmanuel Joke.

Revision 561092 - (view) (annotate) - [select for diffs]
Modified Mon Jul 30 19:02:27 2007 UTC (2 years, 3 months ago) by dogacan
File length: 35243 byte(s)
Diff to previous 559754 (colored)
NUTCH-514 - Indexer should only index pages with fetch status SUCCESS.

Revision 559754 - (view) (annotate) - [select for diffs]
Modified Thu Jul 26 08:44:33 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34958 byte(s)
Diff to previous 559742 (colored)
NUTCH-525 - DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment. Contributed by Vishal Shah.

Revision 559742 - (view) (annotate) - [select for diffs]
Modified Thu Jul 26 08:10:38 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34811 byte(s)
Diff to previous 557344 (colored)
NUTCH-516 - Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE. Contributed by Emmanuel Joke. 

Revision 557344 - (view) (annotate) - [select for diffs]
Modified Wed Jul 18 18:04:26 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34691 byte(s)
Diff to previous 557342 (colored)
NUTCH-518 - Fix OpicScoringFilter to respect scoring filter chaining. Contributed by Enis.

Revision 557342 - (view) (annotate) - [select for diffs]
Modified Wed Jul 18 17:59:59 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34584 byte(s)
Diff to previous 556946 (colored)
NUTCH-517 - build encoding should be UTF-8. Contributed by Enis.

Revision 556946 - (view) (annotate) - [select for diffs]
Modified Tue Jul 17 15:16:40 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34506 byte(s)
Diff to previous 556824 (colored)
NUTCH-506 - Delegate compression to Hadoop.

Revision 556824 - (view) (annotate) - [select for diffs]
Modified Tue Jul 17 06:19:06 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34434 byte(s)
Diff to previous 556072 (colored)
NUTCH-515 - Next fetch time is set incorrectly.

Revision 556072 - (view) (annotate) - [select for diffs]
Modified Fri Jul 13 17:20:44 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34371 byte(s)
Diff to previous 555307 (colored)
NUTCH-513 - suffix-urlfilter.txt does not have a template.

Revision 555307 - (view) (annotate) - [select for diffs]
Modified Wed Jul 11 15:30:29 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34297 byte(s)
Diff to previous 555237 (colored)
NUTCH-510 - IndexMerger delete working dir. Contributed by Enis.

Revision 555237 - (view) (annotate) - [select for diffs]
Modified Wed Jul 11 10:54:37 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34220 byte(s)
Diff to previous 554539 (colored)
NUTCH-505 - Outlink urls should be validated.

Revision 554539 - (view) (annotate) - [select for diffs]
Modified Mon Jul 9 06:44:18 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34159 byte(s)
Diff to previous 554530 (colored)
NUTCH-503 - Generator exits incorrectly for small fetchlists.

Revision 554530 - (view) (annotate) - [select for diffs]
Modified Mon Jul 9 06:15:53 2007 UTC (2 years, 4 months ago) by dogacan
File length: 34061 byte(s)
Diff to previous 551147 (colored)
NUTCH-507 - lib-lucene-analyzers jar defintion is wrong in plugin.xml. Contributed by Emmanuel Joke.

Revision 551147 - (view) (annotate) - [select for diffs]
Modified Wed Jun 27 12:46:05 2007 UTC (2 years, 5 months ago) by dogacan
File length: 33953 byte(s)
Diff to previous 551098 (colored)
NUTCH-498 - Use Combiner in LinkDb to increase speed of linkdb generation. Contributed by Espen Amble Kolstad.

Revision 551098 - (view) (annotate) - [select for diffs]
Modified Wed Jun 27 08:39:22 2007 UTC (2 years, 5 months ago) by dogacan
File length: 33835 byte(s)
Diff to previous 551081 (colored)
NUTCH-499 - Refactor LinkDb and LinkDbMerger to reuse code.

Revision 551081 - (view) (annotate) - [select for diffs]
Modified Wed Jun 27 07:05:52 2007 UTC (2 years, 5 months ago) by dogacan
File length: 33760 byte(s)
Diff to previous 550683 (colored)
NUTCH-474 - Replace usage of ObjectWritable with something based on GenericWritable.

Revision 550683 - (view) (annotate) - [select for diffs]
Modified Tue Jun 26 04:45:35 2007 UTC (2 years, 5 months ago) by kubes
File length: 33655 byte(s)
Diff to previous 550196 (colored)
NUTCH-497: Fixes problems relating to StackOverflow errors
and extreme nested tags.  Adds general framework for stack
based Node walking.

Revision 550196 - (view) (annotate) - [select for diffs]
Modified Sun Jun 24 10:04:30 2007 UTC (2 years, 5 months ago) by dogacan
File length: 33543 byte(s)
Diff to previous 550188 (colored)
NUTCH-504 - Parsing during fetching is broken.

Revision 550188 - (view) (annotate) - [select for diffs]
Modified Sun Jun 24 09:28:41 2007 UTC (2 years, 5 months ago) by dogacan
File length: 33475 byte(s)
Diff to previous 549638 (colored)
NUTCH-468 - Scoring filter should distribute score to all outlinks at once.

Revision 549638 - (view) (annotate) - [select for diffs]
Modified Thu Jun 21 22:52:02 2007 UTC (2 years, 5 months ago) by ab
File length: 33379 byte(s)
Diff to previous 549507 (colored)
Upgrade to Lucene 2.2.0 and Hadoop 0.12.3.

Revision 549507 - (view) (annotate) - [select for diffs]
Modified Thu Jun 21 15:15:32 2007 UTC (2 years, 5 months ago) by dogacan
File length: 33326 byte(s)
Diff to previous 548730 (colored)
NUTCH-471 - Fix synchronization in NutchBean creation.

Revision 548730 - (view) (annotate) - [select for diffs]
Modified Tue Jun 19 14:01:02 2007 UTC (2 years, 5 months ago) by mattmann
File length: 33233 byte(s)
Diff to previous 548666 (colored)
fix for NUTCH-444

Revision 548666 - (view) (annotate) - [select for diffs]
Modified Tue Jun 19 09:21:21 2007 UTC (2 years, 5 months ago) by dogacan
File length: 32835 byte(s)
Diff to previous 548429 (colored)
NUTCH-502 - Bug in SegmentReader causes infinite loop.

Revision 548429 - (view) (annotate) - [select for diffs]
Modified Mon Jun 18 18:13:15 2007 UTC (2 years, 5 months ago) by dogacan
File length: 32730 byte(s)
Diff to previous 548103 (colored)
NUTCH-489 - URLFilter-suffix management of the url path when the url contains some query parameters.

Revision 548103 - (view) (annotate) - [select for diffs]
Modified Sun Jun 17 20:27:17 2007 UTC (2 years, 5 months ago) by dogacan
File length: 32596 byte(s)
Diff to previous 548076 (colored)
NUTCH-485 - Change HtmlParseFilter 's to return ParseResult object instead of Parse object.

Revision 548076 - (view) (annotate) - [select for diffs]
Modified Sun Jun 17 17:19:14 2007 UTC (2 years, 5 months ago) by mattmann
File length: 32474 byte(s)
Diff to previous 547901 (colored)
- fix for NUTCH-443 (contributed by Dogacan)

Revision 547901 - (view) (annotate) - [select for diffs]
Modified Sat Jun 16 10:33:24 2007 UTC (2 years, 5 months ago) by dogacan
File length: 31993 byte(s)
Diff to previous 543264 (colored)
NUTCH-495 - Unnecessary delays in Fetcher2.

Revision 543264 - (view) (annotate) - [select for diffs]
Modified Thu May 31 21:23:45 2007 UTC (2 years, 5 months ago) by ab
File length: 31939 byte(s)
Diff to previous 542903 (colored)
NUTCH-392 - OutputFormat implementations should pass on Progressable.

Revision 542903 - (view) (annotate) - [select for diffs]
Modified Wed May 30 18:35:24 2007 UTC (2 years, 5 months ago) by ab
File length: 31842 byte(s)
Diff to previous 538273 (colored)
NUTCH-61 - adaptive fetch interval patch.

Revision 538273 - (view) (annotate) - [select for diffs]
Modified Tue May 15 18:29:49 2007 UTC (2 years, 6 months ago) by siren
File length: 31741 byte(s)
Diff to previous 537860 (colored)
NUTCH-161 Change Plain text parser to use parser.character.encoding.default property for fall back encoding
spotted by KuroSaka TeruHiko

Revision 537860 - (view) (annotate) - [select for diffs]
Modified Mon May 14 14:51:59 2007 UTC (2 years, 6 months ago) by siren
File length: 31587 byte(s)
Diff to previous 537857 (colored)
NUTCH-483 Remove redundant commons-logging jar from ontology plugin

Revision 537857 - (view) (annotate) - [select for diffs]
Modified Mon May 14 14:37:27 2007 UTC (2 years, 6 months ago) by siren
File length: 31499 byte(s)
Diff to previous 536925 (colored)
NUTCH-482 Remove redundant plugin lib-log4j

Revision 536925 - (view) (annotate) - [select for diffs]
Modified Thu May 10 16:29:51 2007 UTC (2 years, 6 months ago) by siren
File length: 31439 byte(s)
Diff to previous 536909 (colored)
NUTCH-446 RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt, contributed by Doğacan Güney

Revision 536909 - (view) (annotate) - [select for diffs]
Modified Thu May 10 16:13:15 2007 UTC (2 years, 6 months ago) by siren
File length: 31314 byte(s)
Diff to previous 536629 (colored)
NUTCH-456 Parse msexcel plugin speedup contributed by Heiko Dietze

Revision 536629 - (view) (annotate) - [select for diffs]
Modified Wed May 9 19:36:54 2007 UTC (2 years, 6 months ago) by ab
File length: 31243 byte(s)
Diff to previous 536606 (colored)
NUTCH-393 - Indexer should handle null documents returned by filters.

Revision 536606 - (view) (annotate) - [select for diffs]
Modified Wed May 9 18:00:56 2007 UTC (2 years, 6 months ago) by ab
File length: 31140 byte(s)
Diff to previous 532088 (colored)
NUTCH-443 - Allow parsers to return multiple Parse objects.

Revision 532088 - (view) (annotate) - [select for diffs]
Modified Tue Apr 24 21:32:51 2007 UTC (2 years, 7 months ago) by ab
File length: 31041 byte(s)
Diff to previous 526036 (colored)
NUTCH-474 - Fix crawlDelay and blocking checks.

Revision 526036 - (view) (annotate) - [select for diffs]
Modified Fri Apr 6 02:38:15 2007 UTC (2 years, 7 months ago) by mattmann
File length: 30963 byte(s)
Diff to previous 525015 (colored)
- update for new development, Nutch 1.0-dev

Revision 525015 - (view) (annotate) - [select for diffs]
Modified Tue Apr 3 03:41:02 2007 UTC (2 years, 7 months ago) by mattmann
File length: 30932 byte(s)
Diff to previous 524989 (colored)
- remove 0.10-dev as unreleased changes
- prep for 0.9 rc

Revision 524989 - (view) (annotate) - [select for diffs]
Modified Tue Apr 3 01:14:30 2007 UTC (2 years, 7 months ago) by kubes
File length: 30964 byte(s)
Diff to previous 522679 (colored)
Updated CHANGES.txt to reflect NUTCH-333.

Revision 522679 - (view) (annotate) - [select for diffs]
Modified Tue Mar 27 00:36:15 2007 UTC (2 years, 8 months ago) by mattmann
File length: 30848 byte(s)
Diff to previous 521933 (colored)
Release 0.9 steps 1-5

Revision 521933 - (view) (annotate) - [select for diffs]
Modified Fri Mar 23 22:59:01 2007 UTC (2 years, 8 months ago) by ab
File length: 30820 byte(s)
Diff to previous 521182 (colored)
Upgrade to Hadoop 0.12.2 release.

Fix whitespace issues in platform name in bin/hadoop under Cygwin.

Replace deprecated method call.

Revision 521182 - (view) (annotate) - [select for diffs]
Modified Thu Mar 22 10:08:00 2007 UTC (2 years, 8 months ago) by ab
File length: 30776 byte(s)
Diff to previous 520154 (colored)
NUTCH-246 - incorrect segment size being generated due to time
synchronization issue.

Revision 520154 - (view) (annotate) - [select for diffs]
Modified Mon Mar 19 23:02:56 2007 UTC (2 years, 8 months ago) by ab
File length: 30656 byte(s)
Diff to previous 517015 (colored)
Update to Hadoop 0.12.1.

Revision 517015 - (view) (annotate) - [select for diffs]
Modified Sun Mar 11 21:18:23 2007 UTC (2 years, 8 months ago) by siren
File length: 30611 byte(s)
Diff to previous 516870 (colored)
merging 517012:516728 excluding changes made by dennis



Revision 516870 - (view) (annotate) - [select for diffs]
Modified Sun Mar 11 08:25:25 2007 UTC (2 years, 8 months ago) by siren
File length: 30832 byte(s)
Diff to previous 516866 (colored)
remove redundant commons-logging jars

Revision 516866 - (view) (annotate) - [select for diffs]
Modified Sun Mar 11 08:01:22 2007 UTC (2 years, 8 months ago) by siren
File length: 30781 byte(s)
Diff to previous 516835 (colored)
Remove oro as dependency

Revision 516835 - (view) (annotate) - [select for diffs]
Modified Sun Mar 11 01:34:42 2007 UTC (2 years, 8 months ago) by kubes
File length: 30681 byte(s)
Diff to previous 516759 (colored)
Placed NUTCH-233 and NUTCH-436 into the correct order in the file. :(

Revision 516759 - (view) (annotate) - [select for diffs]
Modified Sat Mar 10 18:03:07 2007 UTC (2 years, 8 months ago) by kubes
File length: 30682 byte(s)
Diff to previous 516758 (colored)
Updated to reflect commits of NUTCH-233 and NUTCH-436.

Revision 516758 - (view) (annotate) - [select for diffs]
Modified Sat Mar 10 17:41:17 2007 UTC (2 years, 8 months ago) by siren
File length: 30473 byte(s)
Diff to previous 516754 (colored)
doh! putting oro back since it is still used outside core

Revision 516754 - (view) (annotate) - [select for diffs]
Modified Sat Mar 10 17:30:04 2007 UTC (2 years, 8 months ago) by siren
File length: 30504 byte(s)
Diff to previous 516660 (colored)
Change OutlinkExtractor to use Regular Expressions from JRE, get rid of ORO dependency

Revision 516660 - (view) (annotate) - [select for diffs]
Modified Sat Mar 10 06:52:31 2007 UTC (2 years, 8 months ago) by mattmann
File length: 30400 byte(s)
Diff to previous 515844 (colored)
fix for NUTCH-384 (contributed by Heiko Dietze)

Revision 515844 - (view) (annotate) - [select for diffs]
Modified Wed Mar 7 23:37:21 2007 UTC (2 years, 8 months ago) by ab
File length: 30267 byte(s)
Diff to previous 515791 (colored)
NUTCH-167 - Observation of robots "noarchive" directive.

Revision 515791 - (view) (annotate) - [select for diffs]
Modified Wed Mar 7 21:59:07 2007 UTC (2 years, 8 months ago) by ab
File length: 30195 byte(s)
Diff to previous 515698 (colored)
Upgrade to Hadoop 0.11.2 and Lucene 2.1.0 releases.

Revision 515698 - (view) (annotate) - [select for diffs]
Modified Wed Mar 7 19:02:56 2007 UTC (2 years, 8 months ago) by ab
File length: 30139 byte(s)
Diff to previous 511159 (colored)
NUTCH-432 - JAVA_PLATFORM with spaces breaks bin/nutch.

Also, apply the patch proposed in HADOOP-1080 to fix CLASSPATH problems
under Cygwin.

Revision 511159 - (view) (annotate) - [select for diffs]
Modified Fri Feb 23 22:57:06 2007 UTC (2 years, 9 months ago) by cutting
File length: 30019 byte(s)
Diff to previous 501315 (colored)
NUTCH-449.  Make junit output format configurable.  Contributed by Nigel.

Revision 501315 - (view) (annotate) - [select for diffs]
Modified Tue Jan 30 05:55:03 2007 UTC (2 years, 9 months ago) by mattmann
File length: 29943 byte(s)
Diff to previous 499944 (colored)
Fix for NUTCH-390 Javadoc warnings

Revision 499944 - (view) (annotate) - [select for diffs]
Modified Thu Jan 25 20:15:34 2007 UTC (2 years, 10 months ago) by ab
File length: 29898 byte(s)
Diff to previous 499878 (colored)
Mention the addition of Fetcher2.

Revision 499878 - (view) (annotate) - [select for diffs]
Modified Thu Jan 25 18:11:59 2007 UTC (2 years, 10 months ago) by siren
File length: 29828 byte(s)
Diff to previous 497141 (colored)
NUTCH-433

Revision 497141 - (view) (annotate) - [select for diffs]
Modified Wed Jan 17 19:55:07 2007 UTC (2 years, 10 months ago) by ab
File length: 29702 byte(s)
Diff to previous 496358 (colored)
NUTCH-68 - ported to use map-reduce.

Revision 496358 - (view) (annotate) - [select for diffs]
Modified Mon Jan 15 15:02:37 2007 UTC (2 years, 10 months ago) by siren
File length: 29636 byte(s)
Diff to previous 495762 (colored)
fix NUTCH-430

Revision 495762 - (view) (annotate) - [select for diffs]
Modified Fri Jan 12 22:12:15 2007 UTC (2 years, 10 months ago) by siren
File length: 29556 byte(s)
Diff to previous 495397 (colored)
NUTCH-428

Revision 495397 - (view) (annotate) - [select for diffs]
Modified Thu Jan 11 22:00:51 2007 UTC (2 years, 10 months ago) by ab
File length: 29420 byte(s)
Diff to previous 495392 (colored)
Fix NUTCH-420 - DeleteDuplicates depended on the order of IndexDoc
processing..

Revision 495392 - (view) (annotate) - [select for diffs]
Modified Thu Jan 11 21:51:20 2007 UTC (2 years, 10 months ago) by ab
File length: 29275 byte(s)
Diff to previous 495214 (colored)
Upgrade to Hadoop 0.10.1. HTTPClient is now a dependency - move it
to lib/ and remove it as a plugin.

Add also native Linux libraries for Hadoop compression, plus corresponding
logic in bin/nutch.

Hadoop uses larger buffers now - explicitly set large heap size for
JUnit tests. All tests should pass now.

Revision 495214 - (view) (annotate) - [select for diffs]
Modified Thu Jan 11 13:25:43 2007 UTC (2 years, 10 months ago) by ab
File length: 29239 byte(s)
Diff to previous 493548 (colored)
When indexing redirected pages, drop intermediate pages and only index the
final page.

Avoid NPEs in Crawl tool, when no URLs are generated or fetched.

Revision 493548 - (view) (annotate) - [select for diffs]
Modified Sat Jan 6 19:49:49 2007 UTC (2 years, 10 months ago) by siren
File length: 29128 byte(s)
Diff to previous 493438 (colored)
fix NUTCH-421

Revision 493438 - (view) (annotate) - [select for diffs]
Modified Sat Jan 6 09:39:20 2007 UTC (2 years, 10 months ago) by siren
File length: 29026 byte(s)
Diff to previous 493085 (colored)
Fix NUTCH-325

Revision 493085 - (view) (annotate) - [select for diffs]
Modified Fri Jan 5 16:58:29 2007 UTC (2 years, 10 months ago) by ab
File length: 28879 byte(s)
Diff to previous 490607 (colored)
Fix NUTCH-425 and NUTCH-426.

Revision 490607 - (view) (annotate) - [select for diffs]
Modified Thu Dec 28 00:03:04 2006 UTC (2 years, 11 months ago) by ab
File length: 28767 byte(s)
Diff to previous 478619 (colored)
This patch addresses several issues:

* NUTCH-415 - Generator should mark selected records in CrawlDb.
  Due to increased resource consumption this step is optional.
  Application-level locking has been added to prevent concurrent
  modification of databases.

* NUTCH-416 - CrawlDatum status and CrawlDbReducer refactoring. It is
  now possible to correctly update CrawlDb from multiple segments.
  Introduce new status codes for temporary and permanent
  redirection.

* NUTCH-322 - Fix Fetcher to store redirected pages and to store
  protocol-level status. This also should fix NUTCH-273.

* Change default Fetcher behavior not to follow redirects immediately.
  Instead Fetcher will record redirects as new pages to be added to CrawlDb.
  This also partially addresses NUTCH-273.

* Detect and report when Generator creates 0-sized segments.

* Fix Injector to preserve already existing CrawlDatum if the seed list
  being injected also contains such URL.

This development was partially supported by SiteSell Inc.


Revision 478619 - (view) (annotate) - [select for diffs]
Modified Thu Nov 23 17:15:55 2006 UTC (3 years ago) by mattmann
File length: 27778 byte(s)
Diff to previous 477806 (colored)
- fix for NUTCH-406 Metadata tries to write null values

Revision 477806 - (view) (annotate) - [select for diffs]
Modified Tue Nov 21 18:38:10 2006 UTC (3 years ago) by siren
File length: 27710 byte(s)
Diff to previous 477786 (colored)
NUTCH-305

Revision 477786 - (view) (annotate) - [select for diffs]
Modified Tue Nov 21 17:51:57 2006 UTC (3 years ago) by siren
File length: 27539 byte(s)
Diff to previous 477757 (colored)
NUTCH-362

Revision 477757 - (view) (annotate) - [select for diffs]
Modified Tue Nov 21 17:19:51 2006 UTC (3 years ago) by siren
File length: 27444 byte(s)
Diff to previous 476879 (colored)
NUTCH-405

Revision 476879 - (view) (annotate) - [select for diffs]
Modified Sun Nov 19 18:48:39 2006 UTC (3 years ago) by siren
File length: 27342 byte(s)
Diff to previous 476814 (colored)
NUTCH-403 Make URL filtering optional in Generator

Revision 476814 - (view) (annotate) - [select for diffs]
Modified Sun Nov 19 13:13:54 2006 UTC (3 years ago) by siren
File length: 27276 byte(s)
Diff to previous 476617 (colored)
NUTCH-404 Fix LinkDB Usage - implementation mismatch

Revision 476617 - (view) (annotate) - [select for diffs]
Modified Sat Nov 18 21:55:44 2006 UTC (3 years ago) by siren
File length: 27208 byte(s)
Diff to previous 474464 (colored)
NUTCH-388 Fix description of urlfilter.order

Revision 474464 - (view) (annotate) - [select for diffs]
Modified Mon Nov 13 19:46:56 2006 UTC (3 years ago) by siren
File length: 27102 byte(s)
Diff to previous 473727 (colored)
NUTCH-395 Increase fetching speed

Revision 473727 - (view) (annotate) - [select for diffs]
Modified Sat Nov 11 15:27:40 2006 UTC (3 years ago) by siren
File length: 27054 byte(s)
Diff to previous 469662 (colored)
NUTCH-399 Change CommandRunner to use concurrent api from jdk

Revision 469662 - (view) (annotate) - [select for diffs]
Modified Tue Oct 31 21:36:01 2006 UTC (3 years ago) by ab
File length: 26977 byte(s)
Diff to previous 467356 (colored)
Update.

Revision 467356 - (view) (annotate) - [select for diffs]
Modified Tue Oct 24 15:22:48 2006 UTC (3 years, 1 month ago) by siren
File length: 26884 byte(s)
Diff to previous 467355 (colored)
fix for NUTCH-379

Revision 467355 - (view) (annotate) - [select for diffs]
Modified Tue Oct 24 15:21:43 2006 UTC (3 years, 1 month ago) by siren
File length: 26884 byte(s)
Diff to previous 467345 (colored)
fix for NUTCH-379

Revision 467345 - (view) (annotate) - [select for diffs]
Modified Tue Oct 24 14:28:46 2006 UTC (3 years, 1 month ago) by siren
File length: 26763 byte(s)
Diff to previous 464654 (colored)
fix for NUTCH-391

Revision 464654 - (view) (annotate) - [select for diffs]
Modified Mon Oct 16 20:38:57 2006 UTC (3 years, 1 month ago) by ab
File length: 26666 byte(s)
Diff to previous 451649 (colored)
NUTCH-383: upgrade to Hadoop 0.7.1 and Lucene 2.0.0.

NUTCH-373: replace DeleteDuplicates with a version that implements both
parts of the algorithm. Add JUnit test.

Revision 451649 - (view) (annotate) - [select for diffs]
Modified Sat Sep 30 19:38:30 2006 UTC (3 years, 1 month ago) by pkosiorowski
File length: 26035 byte(s)
Diff to previous 449293 (colored)
NUTCH-374: when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing.(King Kong)

Revision 449293 - (view) (annotate) - [select for diffs]
Modified Sat Sep 23 19:36:47 2006 UTC (3 years, 2 months ago) by ab
File length: 25864 byte(s)
Diff to previous 449102 (colored)
NUTCH-350: urls incorrectly marked as STATUS_FETCH_GONE when blocked by
http.max.delays. Instead the status is set to STATUS_FETCH_RETRY. Since this
is an intermittent problem related to the Fetcher implementation, we don't
increase the retry counter.

Revision 449102 - (view) (annotate) - [select for diffs]
Modified Fri Sep 22 21:49:09 2006 UTC (3 years, 2 months ago) by ab
File length: 25423 byte(s)
Diff to previous 447940 (colored)
NUTCH-332: fix the problem of doubling scores caused by links pointing
to the current page (e.g. anchors).

Revision 447940 - (view) (annotate) - [select for diffs]
Modified Tue Sep 19 19:36:19 2006 UTC (3 years, 2 months ago) by siren
File length: 25280 byte(s)
Diff to previous 447893 (colored)
NUTCH-367 - DistributedSearch thown ClassCastException

Revision 447893 - (view) (annotate) - [select for diffs]
Modified Tue Sep 19 16:01:34 2006 UTC (3 years, 2 months ago) by siren
File length: 25207 byte(s)
Diff to previous 432615 (colored)
NUTCH-105 - Network error during robots.txt fetch causes file to beignored, contributed by Greg Kim

Revision 432615 - (view) (annotate) - [select for diffs]
Modified Fri Aug 18 15:12:12 2006 UTC (3 years, 3 months ago) by siren
File length: 25097 byte(s)
Diff to previous 432611 (colored)
NUTCH-338 - Remove the text parser as an option for parsing PDF files in parse-plugins.xml (Chris A. Mattmann)

Revision 432611 - (view) (annotate) - [select for diffs]
Modified Fri Aug 18 14:56:44 2006 UTC (3 years, 3 months ago) by siren
File length: 24961 byte(s)
Diff to previous 432293 (colored)
NUTCH-347 adjust plugin build script not to emit warnings when copying dependant jars

Revision 432293 - (view) (annotate) - [select for diffs]
Modified Thu Aug 17 16:41:12 2006 UTC (3 years, 3 months ago) by ab
File length: 24859 byte(s)
Diff to previous 431364 (colored)
Update CHANGES.

Revision 431364 - (view) (annotate) - [select for diffs]
Modified Mon Aug 14 14:56:54 2006 UTC (3 years, 3 months ago) by ab
File length: 24688 byte(s)
Diff to previous 429788 (colored)
Optionally skip pages with abnormally large Crawl-Delay values. Original
patch submitted by Dennis Kubes.

Revision 429788 - (view) (annotate) - [select for diffs]
Modified Tue Aug 8 19:30:16 2006 UTC (3 years, 3 months ago) by siren
File length: 24591 byte(s)
Diff to previous 429779 (colored)
Update hadoop version to 0.5.0

Revision 429779 - (view) (annotate) - [select for diffs]
Modified Tue Aug 8 19:09:58 2006 UTC (3 years, 3 months ago) by siren
File length: 24512 byte(s)
Diff to previous 425321 (colored)
NUTCH-344 - Fix for thread blocking issue contributed by Greg Kim

Revision 425321 - (view) (annotate) - [select for diffs]
Modified Tue Jul 25 07:57:11 2006 UTC (3 years, 4 months ago) by siren
File length: 24331 byte(s)
Diff to previous 424779 (colored)
preparing 0.8 release

Revision 424779 - (view) (annotate) - [select for diffs]
Modified Sun Jul 23 18:43:55 2006 UTC (3 years, 4 months ago) by siren
File length: 24333 byte(s)
Diff to previous 423630 (colored)
NUTCH-327 fix log path under cygwin

Revision 423630 - (view) (annotate) - [select for diffs]
Modified Wed Jul 19 22:07:48 2006 UTC (3 years, 4 months ago) by ab
File length: 24274 byte(s)
Diff to previous 422641 (colored)
Add support for Crawl-delay in robots.txt (NUTCH-293).

Revision 422641 - (view) (annotate) - [select for diffs]
Modified Mon Jul 17 06:56:42 2006 UTC (3 years, 4 months ago) by siren
File length: 24044 byte(s)
Diff to previous 420917 (colored)
NUTCH-320 urls are now outputted to stdout

Revision 420917 - (view) (annotate) - [select for diffs]
Modified Tue Jul 11 16:35:10 2006 UTC (3 years, 4 months ago) by siren
File length: 23913 byte(s)
Diff to previous 420902 (colored)
tab->space

Revision 420902 - (view) (annotate) - [select for diffs]
Modified Tue Jul 11 15:50:53 2006 UTC (3 years, 4 months ago) by siren
File length: 23913 byte(s)
Diff to previous 405204 (colored)
added some of missing changes

Revision 405204 - (view) (annotate) - [select for diffs]
Modified Mon May 8 22:34:29 2006 UTC (3 years, 6 months ago) by cutting
File length: 17789 byte(s)
Diff to previous 395676 (colored)
Change parameters passed to Hadoop's FileSystem from (now-deprecated) java.io.File to (new) org.apache.hadoop.fs.Path.

Revision 395676 - (view) (annotate) - [select for diffs]
Modified Thu Apr 20 19:18:56 2006 UTC (3 years, 7 months ago) by cutting
File length: 17709 byte(s)
Diff to previous 312944 (colored)
Fix NUTCH-108.  Log hosts that exceed generate.max.per.host.  Contributed by Rod Taylor.

Revision 312944 - (view) (annotate) - [select for diffs]
Modified Tue Oct 11 19:45:35 2005 UTC (4 years, 1 month ago) by pkosiorowski
File length: 17618 byte(s)
Diff to previous 233161 (colored)
NUTCH-107 - Typo in plugin/urlfilter-*/plugin.xml. (Stephen Cross)

Revision 233161 - (view) (annotate) - [select for diffs]
Modified Wed Aug 17 11:36:46 2005 UTC (4 years, 3 months ago) by pkosiorowski
File length: 17546 byte(s)
Diff to previous 233150 (colored)
0.8-dev version started.

Revision 233150 - (view) (annotate) - [select for diffs]
Modified Wed Aug 17 10:03:30 2005 UTC (4 years, 3 months ago) by pkosiorowski
File length: 17532 byte(s)
Diff to previous 233032 (colored)
Updated release date.

Revision 233032 - (view) (annotate) - [select for diffs]
Modified Tue Aug 16 18:39:23 2005 UTC (4 years, 3 months ago) by pkosiorowski
File length: 17532 byte(s)
Diff to previous 224360 (colored)
Preparing 0.7 release

Revision 224360 - (view) (annotate) - [select for diffs]
Modified Fri Jul 22 16:16:51 2005 UTC (4 years, 4 months ago) by ehatcher
File length: 17519 byte(s)
Diff to previous 179640 (colored)
note method name changes

Revision 179640 - (view) (annotate) - [select for diffs]
Modified Thu Jun 2 20:37:21 2005 UTC (4 years, 5 months ago) by cutting
File length: 17369 byte(s)
Diff to previous 168427 (colored)
Moving Nutch from the Incubator to Lucene.

Revision 168427 - (view) (annotate) - [select for diffs]
Modified Thu May 5 21:40:07 2005 UTC (4 years, 6 months ago) by cutting
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 17369 byte(s)
Diff to previous 168178 (colored)
Automatically convert range queries to range filters.  Requires latest Lucene.

Revision 168178 - (view) (annotate) - [select for diffs]
Modified Wed May 4 19:57:20 2005 UTC (4 years, 6 months ago) by cutting
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 17215 byte(s)
Diff to previous 161984 (colored)
Add result sorting & deduping by fields other than site.

Revision 161984 - (view) (annotate) - [select for diffs]
Modified Tue Apr 19 21:36:26 2005 UTC (4 years, 7 months ago) by cutting
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 17115 byte(s)
Diff to previous 161952 (colored)
Make query boosts configurable.  Patch by Piotr Kosiorowski.

Revision 161952 - (view) (annotate) - [select for diffs]
Modified Tue Apr 19 18:58:12 2005 UTC (4 years, 7 months ago) by cutting
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 16988 byte(s)
Diff to previous 161630 (colored)
Deprecate link analysis.  Remove it from the tutorial and change the default configuration so that link counts are used instead.

Revision 161630 - (view) (annotate) - [select for diffs]
Modified Sun Apr 17 06:51:28 2005 UTC (4 years, 7 months ago) by johnx
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 16737 byte(s)
Diff to previous 160462 (colored)
Close Issue #33 - MIME content type detector (using magic char sequences).

Revision 160462 - (view) (annotate) - [select for diffs]
Modified Thu Apr 7 20:33:14 2005 UTC (4 years, 7 months ago) by siren
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 16591 byte(s)
Diff to previous 160446 (colored)
Fix for bug #4 - Unbalanced quote in query eats all resources.


Revision 160446 - (view) (annotate) - [select for diffs]
Modified Thu Apr 7 19:53:14 2005 UTC (4 years, 7 months ago) by siren
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 16476 byte(s)
Diff to previous 160113 (colored)
Added some features to DistributedSearch: new segments can be added
to searchservers without restarting the frontend, defective search
servers are not queried until tey come back online, watchdog keeps
an eye for your searchservers and writes simple statistics.


Revision 160113 - (view) (annotate) - [select for diffs]
Modified Mon Apr 4 22:26:47 2005 UTC (4 years, 7 months ago) by cutting
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 16171 byte(s)
Diff to previous 159844 (colored)
Fixes to web pages.  Bug #32.

Revision 159844 - (view) (annotate) - [select for diffs]
Modified Sat Apr 2 23:12:26 2005 UTC (4 years, 7 months ago) by johnx
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 15915 byte(s)
Diff to previous 159745 (colored)
Added skipCompressedByteArray() to WritableUtils.java

Revision 159745 - (view) (annotate) - [select for diffs]
Modified Fri Apr 1 23:54:36 2005 UTC (4 years, 7 months ago) by johnx
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 15831 byte(s)
Diff to previous 158845 (colored)
Add servlet Cached.java that serves cached Content of any mime type.

Revision 158845 - (view) (annotate) - [select for diffs]
Modified Wed Mar 23 22:16:29 2005 UTC (4 years, 8 months ago) by cutting
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 15682 byte(s)
Diff to previous 155829 (colored)
Index host and title in separate fields.

Revision 155829 - (view) (annotate) - [select for diffs]
Added Tue Mar 1 22:04:46 2005 UTC (4 years, 8 months ago) by cutting
Original Path: incubator/nutch/trunk/CHANGES.txt
File length: 15198 byte(s)
Initial import of Nutch to Apache.

This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.

  Diffs between and
  Type of Diff should be a

apache@apache.org
ViewVC Help
Powered by ViewVC 1.1.2