/[Apache-SVN]/lucene/nutch/tags/release-0.8/CHANGES.txt
ViewVC logotype

Contents of /lucene/nutch/tags/release-0.8/CHANGES.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 425328 - (show annotations)
Tue Jul 25 08:37:51 2006 UTC (3 years, 4 months ago) by siren
File MIME type: text/plain
File size: 24331 byte(s)
Nutch 0.8 release.
1 Nutch Change Log
2
3 Release 0.8 - 2006-07-25
4
5 0. Totally new architecture, based on hadoop
6 [http://lucene.apache.org/hadoop] (cutting)
7
8 1. NUTCH-107 - Typo in plugin/urlfilter-*/plugin.xml. (Stephen Cross).
9
10 2. NUTCH-108 - Log hosts that exceed generate.max.per.host.
11 (Rod Taylor via cutting)
12
13 3. NUTCH-88 - Enhance ParserFactory plugin selection policy
14 (jerome)
15
16 4. NUTCH-124 - Protocol-httpclient does not follow redirects when
17 fetching robots.txt (cutting)
18
19 5. NUTCH-130 - Be explicit about target JVM when building (1.4.x?)
20 (stack@archive.org, cutting)
21
22 6. NUTCH-114 - Getting number of urls and links from crawldb
23 (Stefan Groschupf via ab)
24
25 7. NUTCH-112 - Link in cached.jsp page to cached content is an
26 absolute link (Chris A. Mattmann via jerome)
27
28 8. NUTCH-135 - Http header meta data are case insensitive in the
29 real world (Stefan Groschupf via jerome)
30
31 9. NUTCH-145 - Build of war file fails on Chinese (zh) .xml files due
32 to UTF-8 BOM (KuroSaka TeruHiko via siren)
33
34 10. NUTCH-121 - SegmentReader for mapred (Rod Taylor via ab)
35
36 11. Added support for OpenSearch (cutting)
37
38 12. NUTCH-142 - NutchConf should use the thread context classloader
39 (Mike Cannon-Brookes via pkosiorowski)
40
41 13. NUTCH-160 - Use standard Java Regex library rather than
42 org.apache.oro.text.regex (Rod Taylor via cutting)
43
44 14. NUTCH-151 - CommandRunner can hang after the main thread exec is
45 finished and has inefficient busy loop (Paul Baclace via cutting)
46
47 15. NUTCH-174 - Problem encountered with ant during compilation
48
49 16. NUTCH-190 - ParseUtil drops reason for failed parse
50 (stack@archive.org via ab)
51
52 17. NUTCH-169 - Remove static NutchConf (Marko Bauhardt via ab)
53
54 18. NUTCH-194 - Nutch-169 introduced two tiny bugs (Marko Bauhardt via ab)
55
56 19. NUTCH-178 - in search.jsp must be session creation "false"
57 (YourSoft via siren)
58
59 20. NUTCH-200 - OpenSearch Servlet ist broken
60 (Marko Bauhardt via siren)
61
62 21. NUTCH-81 - Webapp only works when deployed in root
63 (AJ Banck, Michael Nebel via siren)
64
65 22. NUTCH-139 - Standard metadata property names in the ParseData
66 metadata (Chris A. Mattmann, jerome)
67
68 23. NUTCH-192 - Meta data support for CrawlDatum
69 (Stefan Groschupf via ab)
70
71 24. NUTCH-52 - Parser plugin for MS Excel files
72 (Rohit Kulkarni via jerome)
73
74 25. NUTCH-53 - Parser plugin for Zip files
75 (Rohit Kulkarni via jerome)
76
77 26. NUTCH-137 - footer is not displayed in search result page
78 (KuroSaka TeruHiko via siren)
79
80 27. NUTCH-118 - FAQ link points to invalid URL
81 (Steve Betts via siren)
82
83 28. NUTCH-184 - Serbian (sr, Cyrilic) and Serbo-Croatian (sh, Latin)
84 translation (Ivan Sekulovic via siren)
85
86 29. NUTCH-211 - FetchedSegments leave readers open (Stefan Groschupf
87 via cutting)
88
89 30. NUTCH-140 - Add alias capability in parse-plugins.xml file that
90 allows mimeType->extensionId mapping (Chris A. Mattmann via jerome)
91
92 31. NUTCH-214 - Added Links to web site to search mailling list
93 (Jake Vanderdray via jerome)
94
95 32. NUTCH-204 - Multiple field values in HitDetails
96 (Stefan Groschupf via jerome)
97
98 33. NUTCH-219 - file.content.limit & ftp.content.limit should be changed
99 to -1 to be consistent with http (jerome)
100
101 34. NUTCH-221 - Prepare nutch for upcoming lucene 2.0 (siren)
102
103 35. NUTCH-91 - Empty encoding causes exception (Michael Nebel via
104 pkosiorowski)
105
106 36. NUTCH-228 - Clustering plugin descriptor broken (Dawid Weiss via
107 jerome)
108
109 37. NUTCH-229 - Improved handling of plugin folder configuration
110 (Stefan Groschupf via ab)
111
112 38. NUTCH-206 - Search server throws InstantiationException (ab)
113
114 39. NUTCH-203 - ParseSegment throws InstantiationException (Marko Bauhardt
115 via ab)
116
117 40. NUTCH-3 - Multi values of header discarded (Stefan Groschupf via ab)
118
119 41. Update to lucene 1.9.1 (cutting)
120
121 42. NUTCH-235 - Duplicate Inlink values (ab)
122
123 43. NUTCH-234 - Clustering extension code cleanups and a real
124 JUnit test case for the current implementation (Dawid Weiss via ab)
125
126 44. NUTCH-210 - Context.xml file for Nutch web application
127 (Chris A. Mattmann via jerome)
128
129 45. NUTCH-231 - Invalid CSS entries (AJ Banck via jerome)
130
131 46. NUTCH-232 - Search.jsp has multiple search forms creating
132 invalid html / incorrect focus function (jerome)
133
134 47. NUTCH-196 - lib-xml and lib-log4j plugins (ab, jerome)
135
136 48. NUTCH-244 - Inconsistent handling of property values
137 boundaries / unable to set db.max.outlinks.per.page to
138 infinite (jerome)
139
140 49. NUTCH-245 - DTD for plugin.xml configuration files
141 (Chris A. Mattmann via jerome)
142
143 50. NUTCH-250 - Generate to log truncation caused by
144 generate.max.per.host (Rod Taylor via cutting)
145
146 51. NUTCH-125 - OpenOffice Parser plugin (ab)
147
148 52. Switch from using java.io.File to org.apache.hadoop.fs.Path.
149 (cutting)
150
151 53. NUTCH-240 - Scoring API: extension point, scoring filters and
152 an OPIC plugin (ab)
153
154 54. NUTCH-134 - Summarizer doesn't select the best snippets (jerome)
155
156 55. NUTCH-268 - Generator and lib-http use different definitions of
157 "unique host" (ab)
158
159 56. NUTCH-280 - Url query causes NullPointerException (Grant Glouser
160 via siren)
161
162 57. NUTCH-285 - LinkDb Fails rename doesn't create parent directories
163 (Dennis Kubes via ab)
164
165 58. NUTCH-201 - Add support for subcollections
166 (siren)
167
168 59. NUTCH-298 - If a 404 for a robots.txt is returned a NPE is thrown
169 (Stefan Groschupf via jerome)
170
171 60. NUTCH-275 - Fetcher not parsing XHTML-pages at all (jerome)
172
173 61. NUTCH-301 - CommonGrams loads analysis.common.terms.file for each query
174 (Stefan Groschupf via jerome)
175
176 62. NUTCH-110 - OpenSearchServlet outputs illegal xml characters
177 (stack@archive.org via siren)
178
179 63. NUTCH-292 - OpenSearchServlet: OutOfMemoryError: Java heap space
180 (Stefan Neufeind via siren)
181
182 64. NUTCH-307 - Wrong configured log4j.properties (jerome)
183
184 65. NUTCH-303 - Logging improvements (jerome)
185
186 66. NUTCH-308 - Maximum search time limit (ab)
187
188 67. NUTCH-306 - DistributedSearch.Client liveAddresses concurrency
189 problem (Grant Glouser via siren)
190
191 68. Update to hadoop-0.4 (Milind Bhandarkar, cutting)
192
193 69. NUTCH-317 - Clarify what the queryLanguage argument of
194 Query.parse(...) means (jerome)
195
196 70. Added alternative experimental web gui in contrib containing
197 extensions like subcollection, keymatch, user preferences,
198 caching, implemented mainly using tiles and jstl (siren)
199
200 71. NUTCH-320 DmozParser does not output list of urls to stdout
201 but to a log file instead. Original functionality restored.
202
203 72. NUTCH-271 - Add ability to limit crawling to the set of initially
204 injected hosts (db.ignore.external.links) (Philippe Eugene,
205 Stefan Neufeind via ab)
206
207 73. NUTCH-293 - Support for Crawl-Delay (Stefan Groschupf via ab)
208
209 74. NUTCH-327 - Fixed logging directory on cygwin (siren)
210
211 Release 0.7 - 2005-08-17
212
213 1. Added support for "type:" in queries. Search results are limited/qualified
214 by mimetype or its primary type or sub type. For example,
215 (1) searching with "type:application/pdf" restricts results
216 to pages which were identified to be of mimetype "application/pdf".
217 (2) with "type:application", nutch will return pages of
218 primary type "application".
219 (3) with "type:pdf", only pages of sub type "pdf" will be listed.
220 (John Xing, 20050120)
221
222 2. Added support for "date:" in queries. Last-Modified is indexed.
223 Search results are restricted by lower and upper date (inclusive)
224 as date:yyyymmdd-yyyymmdd. For example, date:20040101-20041231
225 only returns pages with Last-Modified in year 2004.
226 (John Xing, 20050122)
227
228 3. Add URLFilter plugin interface and convert existing url filters into
229 plugins. (John Xing, 20050206)
230
231 4. Add UpdateSegmentsFromDb tool, which updates the scores and
232 anchors of existing segments with the current values in the web
233 db. This is used by CrawlTool, so that pages are now only fetched
234 once per crawl. (Doug Cutting, 20050221)
235
236 5. Moved code into org.apache.nutch sub-packages. Changed license to
237 Apache 2.0. Removed jar files whose licenses do not permit
238 redistribution by Apache. Disabled compilation of plugins which
239 require these libraries. (Doug Cutting 20050301)
240
241 6. Index host and title in separate fields. Host was indexed
242 previously only as a part of the URL. Title was indexed as an
243 anchor. Now boosts for matching these fields may be adjusted
244 separately from boosts for matching anchors and url. Also: move
245 site indexing to index-basic plugin to minimize the number of
246 times the URL needs to be parsed; and, stop using anchor analyzer
247 for anything but anchors. (Piotr Kosiorowski via Doug Cutting
248 20050323)
249
250 7. Add servlet Cached.java that serves cached Content of any mime type.
251 Slightly modified are web.xml and cached.jsp.
252 (John Xing, 20050401)
253
254 8. Add skipCompressedByteArray() to WritableUtils.java.
255 (John Xing, 20050402)
256
257 9. Fixes to jsp and static web pages. These now use relative links,
258 so that the Nutch webapp file can be used in places other than at
259 the root. Also fixed links to the about and help pages. Bug #32.
260 (Jerome Charron via cutting, 20050404)
261
262 10. Added some features to DistributedSearch: new segments can be added
263 to searchservers without restarting the frontend, defective search
264 servers are not queried until tey come back online, watchdog keeps
265 an eye for your searchservers and writes simple statistics.
266 (Sami Siren, 20050407)
267
268 11. Fix for bug #4 - Unbalanced quote in query eats all resources.
269 (Piotr Kosiorowski, Sami Siren, 20050407)
270
271 12. Close Issue #33 - MIME content type detector (using magic char sequences).
272 (Jerome Charron and Hari Kodungallur via John Xing, 20050416)
273
274 13. Add a servlet that implements A9's OpenSearch RSS web service.
275 (cutting, 20050418)
276
277 14. Remove references to link analysis from tutorial, and enable
278 scoring by link count when generating fetchlists and searching.
279 (cutting, 20040419)
280
281 15. Make query boosts for host, title, anchor and phrase matches
282 configurable. (Piotr Kosiorowski via cutting, 20050419)
283
284 16. Add support for sorting search results and search-time deduping by
285 fields other than site.
286
287 17. Automatically convert range queries into cached range filters.
288 This improves the performance and scalability of, e.g., date range
289 searching.
290
291 18. Several methods have been renamed due to misspellings. The old
292 methods have been deprecated and will be removed before the 1.0
293 release.
294
295
296 Release 0.6
297
298 1. Added clustering-carrot2 plugin, together with introduction of clustering
299 api and modification to search jsp. (Dawid Weiss via John Xing, 20040809)
300
301 2. Make a number of changes to NDFS (Nutch Distributed File System)
302 to fix bugs, add admin tools, etc.
303
304 Also, modify all command line tools so you can indicate whether to
305 use NDFS or the local filesystem. If you indicate nothing, then
306 it defaults to the local fs.
307
308 I've used this to do a 35m page crawl via NDFS, distributed over a
309 dozen machines. (Mike Cafarella)
310
311 3. Add support for BASE tags in HTML. Outlinks are now correctly
312 extracted when a BASE tag is present. (cutting)
313
314 4. Fix two bugs in result pagination. When the last hit on a page
315 was the last hit overall, the "next" button was sometimes shown
316 when the "show all" button should be shown instead. Also, in
317 certain cases, the "show all" button would be shown when the
318 "next" button should have been shown. (cutting)
319
320 5. Add config parameter "indexer.max.tokens" that determines the
321 maximum number of tokens indexed per field. (Andy Hedges via cutting)
322
323 6. Add parser for mp3 files. (Andy Hedges via cutting)
324
325 7. Add RegexUrlNormalizer. This is useful for things like stripping
326 out session IDs from URLs. To use it, add values for
327 urlnormalizer.class and urlnormalizer.regex.file to your
328 nutch-site.xml. The RegexUrlNormalizer class extends the
329 BasicUrlNormalizer, and does basic normalization as well.
330 (Luke Baker via cutting)
331
332 8. Added Swedish translation (Stefan Verzel via Sami Siren, 20040910)
333
334 9. Added Polish translation (Andrzej Bialecki, 20040911)
335
336 10. Added 3 more language profiles to language identifier (ru,hu,pl).
337 Other changes to language identifier: Porfiles converted to utf8,
338 added some test cases, changed the similarity calculation.
339 (Sami Siren, 20040925)
340
341 11. Added plugin parse-rtf (Andy Hedges via John Xing, 20040929)
342
343 12. Added plugin index-more and more.jsp (John Xing, 20041003)
344
345 13. Added "View as Plain Text" feature. A new op OP_PARSETEXT is introduced
346 in DistributedSearch.java. text.jsp is added. (John Xing, 20041006)
347
348 14. Fixed a bug that fails cached.jsp, explain.jsp, anchors.jsp and text.jsp
349 (but not search.jsp) with NullPointerException in distributed search.
350 It seems that this bug appears after "hits per site" stuff is added.
351 The fix is done in Hit.java, making sure String site is never null.
352 Hope this fix not have bad effetct on "hits per site" code.
353 (John Xing, 20041006)
354
355 15. Fixed a bug that fails fullyDelete() in FileUtil.java for
356 LocalFileSystem.java. This bug also exposes possible incompleteness
357 of NDFSFile.java, where a few methods are not supported, including
358 delete(). Nothing changed in NDFSFile.java though. Leave it for future
359 improvement (John Xing, 20041022).
360
361 16. Introduced option -noParsing to Fetcher.java and added ParseSegment.java.
362 A new status code CANT_PARSE is added to FetcherOutput.java.
363 Without option -noParsing , no change in fetcher behavior. With
364 option -noParsing, fetcher does crawls only, no parsing is carried out.
365 Then, ParseSegment.java should be used to parse in separate pass.
366 (John Xing, 20041025)
367
368 17. Added ontology plugin. Currently it is used for query refinement, as
369 examplified in refine-query-init.jsp and refine-query.jsp. By default,
370 query refinement is disabled in search.jsp. Please check
371 ./src/plugin/ontology/README.txt for further description.
372 Ontology plugin certainly can be used for many other things.
373 (Michael J. Pan via John Xing, 20041129)
374
375 18. Changed fetcher.server.delay to be a float, so that sub-second
376 delays can be specified. (cutting)
377
378 19. Added plugin.includes config parameter that determines which
379 plugins are included. By default now only http, html and basic
380 indexing and search plugins are enabled, rather than all plugins.
381 This should make default performance more predictable and reliable
382 going forward. (cutting)
383
384 20. Cleaned up some filesystem code, including:
385
386 - Replaced BufferedRandomAccessFile with two simpler utilties,
387 NFSDataInputStream and NFSDataOutputStream.
388
389 - Fixed the bug where SequenceFiles were no longer flushed when
390 created, so that, when fetches crashed, segments were
391 unreadable. Now segments are always readable after crashes.
392 Only the contents of the last buffer is lost.
393
394 - Simplified the FSOutputStream API to not include seek(). We
395 should never need that functionality.
396
397 - Simplified LocalFileSystem's implementations of FSInputStream
398 and FSOutputStream and optimized FSInputStream.seek().
399
400 (cutting)
401
402 21. Fixed BasicUrlNormalizer to better handle relative urls. The file
403 part of a URL is normalized in the following manner:
404
405 1. "/aa/../" will be replaced by "/" This is done step by step until
406 the url doesn´t change anymore. So we ensure, that
407 "/aa/bb/../../" will be replaced by "/", too
408
409 2. leading "/../" will be replaced by "/"
410
411 (Sven Wende via cutting)
412
413 22. Fix Page constructors so that next fetch date is less likely to be
414 misconstrued as a float. This patches a problem in WebDBInjector,
415 where new pages were added to the db with nextScore set to the
416 intended nextFetch date. This, in turn, confused link analysis.
417
418 23. In ndfs code, replace addLocalFile(), putToLocalFile() with
419 copyFromLocalFile(), moveFromLocalFile(), copyToLocalFile() and
420 moveToLocalFile(). (John Xing, 20041217)
421
422 24. Added new config parameter fetcher.threads.per.host. This is used
423 by the Http protocol. When this is one behavior is as before.
424 When this is greater than one then multiple threads are permitted
425 to access a host at once. Note that fetcher.server.delay is no
426 longer consistently observed when this is greater than one.
427 (Luke Baker via Doug Cutting)
428
429 Release 0.5
430
431 1. Changed plugin directory to be a list of directories.
432
433 2. Permit Plugin to be the default plugin implementation.
434
435 3. Added pluggable interface for network protocols in new package
436 net.nutch.protocol. Moved http code from core into a plugin.
437
438 4. Added pluggable interface for content parsing in new package
439 net.nutch.parse. Moved html parsing code from core into a
440 plugin.
441
442 5. Fixed a bug in NutchAnalysis where 16-bit characters were not
443 processed correctly.
444
445 6. Fixed bug #971731: random summaries on result page.
446 (Daniel Naber via cutting)
447
448 7. Made Nutch logo transparent. (Daniel Naber via cutting)
449
450 8. Added file protocol plugin. (John Xing via cutting)
451
452 9. Added ftp protocol plugin. (John Xing via cutting)
453
454 10. Added pdf and msword parser plugins. (John Xing via cutting)
455
456 11. Added pluggable indexing interface. By default, url, content,
457 anchors and title are indexed, as before, but now one can easily
458 alter this to, e.g., index metadata. A demonstration is provided
459 which extracts and indexes Creative Commons license urls. (cutting)
460
461 12. Add language identification plugin.
462
463 The process of identification is as follows:
464
465 1. html (html only, HTML 4.0 "lang" attribute)
466 2. meta tags (html only, http-equiv, dc.language)
467 3. http header (Content-Language)
468 4. if all above fail "statistical analysis"
469
470 1 & 2 are run during the fetching phase and 3 & 4 are run on
471 indexing phase.
472
473 Currently supported languages (in "statistical analysis") are
474 da,de,el,en,es,fi,fr,it,nl,sv and pt. The corpus used was grabbed
475 from http://www.isi.edu/~koehn/europarl/ and the profiles were
476 build with tool supplied in patch.
477
478 After indexing the language can be found from field named "lang"
479
480 It's not 100% accurate but it's a start.
481 (Sami Siren)
482
483 13. Added SegmentMergeTool and "mergesegs" command, to remove
484 duplicated or otherwise not used content from several segments and
485 joining them together into a single new segment. The tool also
486 optionally performs several other steps required for proper
487 operation of Nutch - such as indexing segments, deleting
488 duplicates, merging indices, and indexing the new single segment.
489 (Andrzej Bialecki)
490
491 14. Add the ability to retrieve ParseData of a search hit. ParseData
492 contains many valuable properties of a search hit.
493
494 This is required (among others) to properly display the cached
495 content because it's not possible to determine the character
496 encoding from the output of the getContent() method (which returns
497 byte[]). The symptoms are that for HTML pages using non-latin1 or
498 non-UTF8 encodings the cached preview will almost certainly look
499 broken. Using the attached patch it is possible to determine the
500 character encoding from the ParseData (for HTTP: Content-Type
501 metadata), and encode the content accordingly. (Andrzej Bialecki)
502
503 15. Add a pluggable query interface. By default, the content, anchor
504 and url fields are searched as before. A sample plugin indexes
505 the host name and adds a "site:" keyword to query parsing.
506
507 16. Added support for "lang:" in queries. For example, searching with
508 "lang:en" restricts results to pages which were identified to
509 be in English.
510
511 17. Automatically optimize field queries to use cached Lucene filters.
512 This makes, for example, searches restricted by languages or sites
513 that are very common much faster.
514
515 18. Improved charset handling in jsp pages. (jshin by cutting)
516
517 19. Permit topic filtering when injecting DMOZ pages. (jshin by cutting)
518
519 20. When parsing crawled pages, interpret charset specifications in
520 html meta tags. (jshin by cutting)
521
522 21. Added support for "cc:licensed" in queries, which searches for documents
523 released under Creative Commons licenses. Attributes of the
524 license may also be queried, with, e.g., "cc:by" for
525 attribution-required licenses, "cc:nc" for non-commercial
526 licenses, etc.
527
528 22. Relative paths named in plugin.folders are now searched for on the
529 classpath. This makes, e.g., deployment in a war file much simpler.
530
531 23. Modifications to Fetcher.java.
532
533 1. Make sure it works properly with regard to creation and initialization
534 of plugin instances. The problem was that multiple threads race to
535 startUp() or shutDown() plugin instances. It was solved by synchronizing
536 certain codes in PluginRepository.java and Extension.java.
537 (Stefan Groschupf via John Xing)
538
539 2. Added code to explictly shutDown() plugins. Otherwise FetcherThreads
540 may never return (quit) if there are still data or other structures
541 (e.g., persistent socket connections) associated with plugins. (John Xing)
542
543 3. Fixed one type of Fetcher "hang" problems by monitoring named
544 FetcherThreads. If all FetcherThreads are gone (finished),
545 Fetcher.java is considered done. The problem was: there could be
546 runaway threads started by external libs via FetcherThreads.
547 Those threads never return, thus keep Fetcher from exiting normally.
548 (John Xing)
549
550 24. Eliminate excessive hits from sites. This is done efficiently by
551 adding the site name to Hit instances, and, when needed,
552 re-querying with too-frequent sites prohibited in the query.
553
554
555 Release 0.4
556
557 1. Http class refactored. (Kevin Smith via Tom Pierce)
558
559 2. Add Finnish translation. (Sampo Syreeni via Doug Cutting)
560
561 3. Added Japanese translation. (Yukio Andoh via Doug Cutting)
562
563 4. Updated Dutch translation. (Ype Kingma via Doug Cutting)
564
565 5. Initial version of Distributed DB code. (Mike Cafarella)
566
567 6. Make things more tolerant of crashed fetcher output files.
568 (Doug Cutting)
569
570 7. New skin for website. (Frank Henze via Doug Cutting)
571
572 8. Added Spanish translation. (Diego Basch via Doug Cutting)
573
574 9. Add FTP support to fetcher. (John Xing via Doug Cutting)
575
576 10. Added Thai translation. (Pichai Ongvasith via Doug Cutting)
577
578 11. Added Robots.txt & throttling support to Fetcher.java. (Mike
579 Cafarella)
580
581 12. Added nightly build. (Doug Cutting)
582
583 13. Default all link scores to 1.0. (Doug Cutting)
584
585 14. Permit one to keep internal links. (Doug Cutting)
586
587 15. Fixed dedup to select shortest URL. (Doug Cutting)
588
589 16. Changed index merger so that merged index is written to named
590 directory, rather than to a generated name in that directory.
591 (Doug Cutting)
592
593 17. Disable coordination weighting of query clauses and other minor
594 scoring improvements. (Doug Cutting)
595
596 18. Added a new command, crawl, that constructs a database, injects a
597 url file and performs a few rounds of generate/fetch/updatedb.
598 This simplifies use for intranet sites. Changed some defaults to
599 be more intranet friendly. (Doug Cutting)
600
601 19. Fixed a bug where Fetcher.java didn't construct correct relative
602 links when a page was redirected. (Doug Cutting)
603
604 20. Fixed a query parser problem with lookahead over plusses and minuses.
605 (Doug Cutting)
606
607 21. Add support for HTTP proxy servers. (Sami Siren via Doug Cutting)
608
609 22. Permit searching while fetching and/or indexing.
610 (Sami Siren via Doug Cutting)
611
612 23. Fix a bug when throttling is disabled. (Sami Siren via Doug Cutting)
613
614 24. Updated Bahasa Malaysia translation. (Michael Lim via Doug Cutting)
615
616 25. Added Catalan translation. (Xavier Guardiola via Doug Cutting)
617
618 26. Added brazilian portuguese translation.
619 (A. Moreir via Doug Cutting)
620
621 27. Added a french translation. (Julien Nioche via Doug Cutting)
622
623 28. Updated to Lucene 1.4RC3. (Doug Cutting)
624
625 29. Add capability to boost by link count & use it in crawl tool.
626 (Doug Cutting)
627
628 30. Added plugin system. (Stefan Groschupf via Doug Cutting)
629
630 31. Add this change log file, for recording significant changes to
631 Nutch. Populate it with changes from the last few months.

apache@apache.org
ViewVC Help
Powered by ViewVC 1.1.2