Apache Solr - DataImportHandler Version 1.3-dev Release Notes Introduction ------------ DataImportHandler is a data import tool for Solr which makes importing data from Databases, XML files and HTTP data sources quick and easy. $Id$ ================== Release 1.4-dev ================== Upgrading from Solr 1.3 ----------------------- Evaluator API has been changed in a non back-compatible way. Users who have developed custom Evaluators will need to change their code according to the new API for it to work. See SOLR-996 for details. The formatDate evaluator's syntax has been changed. The new syntax is formatDate(, ''). For example, formatDate(x.date, 'yyyy-MM-dd'). In the old syntax, the date string was written without a single-quotes. The old syntax has been deprecated and will be removed in 1.5, until then, using the old syntax will log a warning. Detailed Change List ---------------------- New Features ---------------------- 1. SOLR-768: Set last_index_time variable in full-import command. (Wojtek Piaseczny, Noble Paul via shalin) 2. SOLR-811: Allow a "deltaImportQuery" attribute in SqlEntityProcessor which is used for delta imports instead of DataImportHandler manipulating the SQL itself. (Noble Paul via shalin) 3. SOLR-842: Better error handling in DataImportHandler with options to abort, skip and continue imports. (Noble Paul, shalin) 4. SOLR-833: A DataSource to read data from a field as a reader. This can be used, for example, to read XMLs residing as CLOBs or BLOBs in databases. (Noble Paul via shalin) 5. SOLR-887: A Transformer to strip HTML tags. (Ahmed Hammad via shalin) 6. SOLR-886: DataImportHandler should rollback when an import fails or it is aborted (shalin) 7. SOLR-891: A Transformer to read strings from Clob type. (Noble Paul via shalin) 8. SOLR-812: Configurable JDBC settings in JdbcDataSource including optimized defaults for read only mode. (David Smiley, Glen Newton, shalin) 9. SOLR-910: Add a few utility commands to the DIH admin page such as full import, delta import, status, reload config. (Ahmed Hammad via shalin) 10.SOLR-938: Add event listener API for import start and end. (Kay Kay, Noble Paul via shalin) 11.SOLR-801: Add support for configurable pre-import and post-import delete query per root-entity. (Noble Paul via shalin) 12.SOLR-988: Add a new scope for session data stored in Context to store objects across imports. (Noble Paul via shalin) 13.SOLR-980: A PlainTextEntityProcessor which can read from any DataSource and output a String. (Nathan Adams, Noble Paul via shalin) 14.SOLR-1003: XPathEntityprocessor must allow slurping all text from a given xml node and its children. (Noble Paul via shalin) 15.SOLR-1001: Allow variables in various attributes of RegexTransformer, HTMLStripTransformer and NumberFormatTransformer. (Fergus McMenemie, Noble Paul, shalin) 16.SOLR-989: Expose running statistics from the Context API. (Noble Paul, shalin) 17.SOLR-996: Expose Context to Evaluators. (Noble Paul, shalin) 18.SOLR-783: Enhance delta-imports by maintaining separate last_index_time for each entity. (Jon Baer, Noble Paul via shalin) 19.SOLR-1033: Current entity's namespace is made available to all Transformers. This allows one to use an output field of TemplateTransformer in other transformers, among other things. (Fergus McMenemie, Noble Paul via shalin) Optimizations ---------------------- 1. SOLR-846: Reduce memory consumption during delta import by removing keys when used (Ricky Leung, Noble Paul via shalin) 2. SOLR-974: DataImportHandler skips commit if no data has been updated. (Wojtek Piaseczny, shalin) 3. SOLR-1004: Check for abort more frequently during delta-imports. (Marc Sturlese, shalin) Bug Fixes ---------------------- 1. SOLR-800: Deep copy collections to avoid ConcurrentModificationException in XPathEntityprocessor while streaming (Kyle Morrison, Noble Paul via shalin) 2. SOLR-823: Request parameter variables ${dataimporter.request.xxx} are not resolved (Mck SembWever, Noble Paul, shalin) 3. SOLR-728: Add synchronization to avoid race condition of multiple imports working concurrently (Walter Ferrara, shalin) 4. SOLR-742: Add ability to create dynamic fields with custom DataImportHandler transformers (Wojtek Piaseczny, Noble Paul, shalin) 5. SOLR-832: Rows parameter is not honored in non-debug mode and can abort a running import in debug mode. (Akshay Ukey, shalin) 6. SOLR-838: The VariableResolver obtained from a DataSource's context does not have current data. (Noble Paul via shalin) 7. SOLR-864: DataImportHandler does not catch and log Errors (shalin) 8. SOLR-873: Fix case-sensitive field names and columns (Jon Baer, shalin) 9. SOLR-893: Unable to delete documents via SQL and deletedPkQuery with deltaimport (Dan Rosher via shalin) 10. SOLR-888: DateFormatTransformer cannot convert non-string type (Amit Nithian via shalin) 11. SOLR-841: DataImportHandler should throw exception if a field does not have column attribute (Michael Henson, shalin) 12. SOLR-884: CachedSqlEntityProcessor should check if the cache key is present in the query results (Noble Paul via shalin) 13. SOLR-985: Fix thread-safety issue with TemplateString for concurrent imports with multiple cores. (Ryuuichi Kumai via shalin) 14. SOLR-999: XPathRecordReader fails on XMLs with nodes mixed with CDATA content. (Fergus McMenemie, Noble Paul via shalin) 15.SOLR-1000: FileListEntityProcessor should not apply fileName filter to directory names. (Fergus McMenemie via shalin) 16.SOLR-1009: Repeated column names result in duplicate values. (Fergus McMenemie, Noble Paul via shalin) 17.SOLR-1017: Fix thread-safety issue with last_index_time for concurrent imports in multiple cores due to unsafe usage of SimpleDateFormat by multiple threads. (Ryuuichi Kumai via shalin) 18.SOLR-1024: Calling abort on DataImportHandler import commits data instead of calling rollback. (shalin) 19.SOLR-1037: DIH should not add null values in a row returned by EntityProcessor to documents. (shalin) 20.SOLR-1040: XPathEntityProcessor fails with an xpath like /feed/entry/link[@type='text/html']/@href (Noble Paul via shalin) 21.SOLR-1042: Fix memory leak in DIH by making TemplateString non-static member in VariableResolverImpl (Ryuuichi Kumai via shalin) Documentation ---------------------- Other ---------------------- 1. SOLR-782: Refactored SolrWriter to make it a concrete class and removed wrappers over SolrInputDocument. Refactored to load Evaluators lazily. Removed multiple document nodes in the configuration xml. Removed support for 'default' variables, they are automatically available as request parameters. (Noble Paul via shalin) 2. SOLR-964: XPathEntityProcessor now ignores DTD validations (Fergus McMenemie, Noble Paul via shalin) 3. SOLR-1029: Standardize Evaluator parameter parsing and added helper functions for parsing all evaluator parameters in a standard way. (Noble Paul, shalin) ================== Release 1.3.0 20080915 ================== Status ------ This is the first release since DataImportHandler was added to the contrib solr distribution. The following changes list changes since the code was introduced, not since the first official release. Detailed Change List -------------------- New Features 1. SOLR-700: Allow configurable locales through a locale attribute in fields for NumberFormatTransformer. (Stefan Oestreicher, shalin) Changes in runtime behavior Bug Fixes 1. SOLR-704: NumberFormatTransformer can silently ignore part of the string while parsing. Now it tries to use the complete string for parsing. Failure to do so will result in an exception. (Stefan Oestreicher via shalin) 2. SOLR-729: Context.getDataSource(String) gives current entity's DataSource instance regardless of argument. (Noble Paul, shalin) 3. SOLR-726: Jdbc Drivers and DataSources fail to load if placed in multicore sharedLib or core's lib directory. (Walter Ferrara, Noble Paul, shalin) Other Changes