These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
The ‘du’ (disk usage command from Unix) script refresh monitor is now configurable in the same way as its ‘df’ counterpart, via the property ‘fs.du.interval’, the default of which is 10 minute (in ms).
Added file system implementation for OpenStack Swift. There are two implementation: block and native (similar to Amazon S3 integration). Data locality issue solved by patch in Swift, commit procedure to OpenStack is in progress.
To use implementation add to core-site.xml following:
<property> <name>fs.swift.impl</name> <value>com.mirantis.fs.SwiftFileSystem</value> </property> <property> <name>fs.swift.block.impl</name> <value>com.mirantis.fs.block.SwiftBlockFileSystem</value> </property>
In MapReduce job specify following configs for OpenStack Keystone authentication:
conf.set("swift.auth.url", "http://172.18.66.117:5000/v2.0/tokens"); conf.set("swift.tenant", "superuser"); conf.set("swift.username", "admin1"); conf.set("swift.password", "password"); conf.setInt("swift.http.port", 8080); conf.setInt("swift.https.port", 443);
Additional information specified on github: https://github.com/DmitryMezhensky/Hadoop-and-Swift-integration
Addition of FixedLengthInputFormat and FixedLengthRecordReader in the org.apache.hadoop.mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. When creating a job that specifies this input format, the job must have the “mapreduce.input.fixedlengthinputformat.record.length” property set as follows myJobConf.setInt(“mapreduce.input.fixedlengthinputformat.record.length”,[myFixedRecordLength]);
Please see javadoc for more details.
Fix the https support in HsftpFileSystem. With the change the client now verifies the server certificate. In particular, client side will verify the Common Name of the certificate using a strategy specified by the configuration property “hadoop.ssl.hostname.verifier”.
Direct Bytebuffer decompressors for Zlib (Deflate & Gzip) and Snappy
libhdfs now returns correct codes in errno. Previously, due to a bug, many functions set errno to 255 instead of the more specific error code.
Add new HTTP policy configuration. Users can use “dfs.http.policy” to control the HTTP endpoints for NameNode and DataNode. Specifically, The following values are supported: - HTTP_ONLY : Service is provided only on http - HTTPS_ONLY : Service is provided only on https - HTTP_AND_HTTPS : Service is provided both on http and https
hadoop.ssl.enabled and dfs.https.enabled are deprecated. When the deprecated configuration properties are still configured, currently http policy is decided based on the following rules: 1. If dfs.http.policy is set to HTTPS_ONLY or HTTP_AND_HTTPS. It picks the specified policy, otherwise it proceeds to 2~4. 2. It picks HTTPS_ONLY if hadoop.ssl.enabled equals to true. 3. It picks HTTP_AND_HTTPS if dfs.https.enable equals to true. 4. It picks HTTP_ONLY for other configurations.
Add a new configuration property “dfs.webhdfs.user.provider.user.pattern” for specifying user name filters for WebHDFS.
Makes the retries and time between retries getting the length of the last block on file configurable. Below are the new configurations.
dfs.client.retry.times.get-last-block-length dfs.client.retry.interval-ms.get-last-block-length
They are set to the 3 and 4000 respectively, these being what was previously hardcoded.
Add a new editlog record (OP_ADD_BLOCK) that only records allocation of the new block instead of the entire block list, on every block allocation.