/[Apache-SVN]
ViewVC logotype

Revision 1512296


Jump to revision: Previous Next
Author: uschindler
Date: Fri Aug 9 13:26:55 2013 UTC (11 years, 3 months ago)
Changed paths: 4
Log Message:
SOLR-4679, SOLR-4908, SOLR-5124: Text extracted from HTML or PDF files using Solr Cell was missing ignorable whitespace, which is inserted by TIKA for convenience to support plain text extraction without using the HTML elements. This bug resulted in glued words.

Changed paths

Path Details
Directorylucene/dev/trunk/solr/CHANGES.txt modified , text changed
Directorylucene/dev/trunk/solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/SolrContentHandler.java modified , text changed
Directorylucene/dev/trunk/solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java modified , text changed
Directorylucene/dev/trunk/solr/contrib/extraction/src/test-files/extraction/simple.html modified , text changed

infrastructure at apache.org
ViewVC Help
Powered by ViewVC 1.1.26