/[Apache-SVN]
ViewVC logotype

Revision 1512297


Jump to revision: Previous Next
Author: uschindler
Date: Fri Aug 9 13:27:53 2013 UTC (10 years, 8 months ago)
Changed paths: 7
Log Message:
Merged revision(s) 1512296 from lucene/dev/trunk:
SOLR-4679, SOLR-4908, SOLR-5124: Text extracted from HTML or PDF files using Solr Cell was missing ignorable whitespace, which is inserted by TIKA for convenience to support plain text extraction without using the HTML elements. This bug resulted in glued words.

Changed paths

Path Details
Directorylucene/dev/branches/branch_4x/ modified , props changed
Directorylucene/dev/branches/branch_4x/solr/ modified , props changed
Directorylucene/dev/branches/branch_4x/solr/CHANGES.txt modified , text changed , props changed
Directorylucene/dev/branches/branch_4x/solr/contrib/ modified , props changed
Directorylucene/dev/branches/branch_4x/solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/SolrContentHandler.java modified , text changed
Directorylucene/dev/branches/branch_4x/solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java modified , text changed
Directorylucene/dev/branches/branch_4x/solr/contrib/extraction/src/test-files/extraction/simple.html modified , text changed

infrastructure at apache.org
ViewVC Help
Powered by ViewVC 1.1.26