Apache Commons Compress RELEASE NOTES Apache Commons Compress software defines an API for working with compression and archive formats. These include: bzip2, gzip, pack200, lzma, xz, Snappy, traditional Unix Compress, DEFLATE and ar, cpio, jar, tar, zip, dump, 7z, arj. Release 1.10 ------------ Release 1.10 moves the former org.apache.commons.compress.compressors.z._internal_ package which breaks backwards compatibility for code which used the old package. This also changes the superclass of ZCompressorInputStream which makes this class binary incompatible with the one of Compress 1.9. Code that extends ZCompressorInputStream will need to be recompiled in order to work with Compress 1.10. New features: o CompressorStreamFactory can now auto-detect DEFLATE streams with ZLIB header. Issue: COMPRESS-316. Thanks to Nick Burch. o CompressorStreamFactory can now auto-detect LZMA streams. Issue: COMPRESS-313. o Added support for parallel compression. This low-level API allows a client to build a zip/jar file by using the class org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator. Zip documentation updated with further notes about parallel features. Please note that some aspects of jar creation need to be handled by client code and is not part of commons-compress for this release. Issue: COMPRESS-296. Thanks to Kristian Rosenvold. o Cut overall object instantiation in half by changing file header generation algorithm, for a 10-15 percent performance improvement. Also extracted two private methods createLocalFileHeader and createCentralFileHeader in ZipArchiveOutputStream. These may have some interesting additional usages in the near future. Thanks to Kristian Rosenvold. o New methods in ZipArchiveOutputStream and ZipFile allows entries to be copied from one archive to another without having to re-compress them. Issue: COMPRESS-295. Thanks to Kristian Rosenvold. Fixed Bugs: o TarArchiveInputStream can now read entries with group or user ids > 0x80000000. Issue: COMPRESS-314. o TarArchiveOutputStream can now write entries with group or user ids > 0x80000000. Issue: COMPRESS-315. o TarArchiveEntry's constructor with a File and a String arg didn't normalize the name. Issue: COMPRESS-312. o ZipEncodingHelper no longer reads system properties directly to determine the default charset. Issue: COMPRESS-308. o BZip2CompressorInputStream#read would return -1 when asked to read 0 bytes. Issue: COMPRESS-309. o ArchiveStreamFactory fails to pass on the encoding when creating some streams. * ArjArchiveInputStream * CpioArchiveInputStream * DumpArchiveInputStream * JarArchiveInputStream * TarArchiveInputStream * JarArchiveOutputStream Issue: COMPRESS-306. o Restore immutability/thread-safety to ArchiveStreamFactory. The class is now immutable provided that the method setEntryEncoding is not used. The class is thread-safe. Issue: COMPRESS-302. o Restore immutability/thread-safety to CompressorStreamFactory. The class is now immutable provided that the method setDecompressConcatenated is not used. The class is thread-safe. Issue: COMPRESS-303. o ZipFile logs a warning in its finalizer when its constructor has thrown an exception reading the file - for example if the file doesn't exist. Issue: COMPRESS-297. o Improved error message when tar encounters a groupId that is too big to write without using the STAR or POSIX format. Issue: COMPRESS-290. Thanks to Kristian Rosenvold. o SevenZFile now throws the specific PasswordRequiredException when it encounters an encrypted stream but no password has been specified. Issue: COMPRESS-298. Changes: o Moved the package org.apache.commons.compress.compressors.z._internal_ to org.apache.commons.compress.compressors.lzw and made it part of the API that is officially supported. This will break existing code that uses the old package. Thanks to Damjan Jovanovic. For complete information on Apache Commons Compress, including instructions on how to submit bug reports, patches, or suggestions for improvement, see the Apache Commons Compress website: http://commons.apache.org/compress/ Old Release Notes ================= Release 1.9 ----------- New features: o Added support for DEFLATE streams without any gzip framing. Issue: COMPRESS-263. Thanks to Matthias Stevens. Fixed Bugs: o When reading 7z files unknown file properties and properties of type kDummy are now ignored. Issue: COMPRESS-287. o Expanding 7z archives using LZMA compression could cause an EOFException. Issue: COMPRESS-286. o Long-Name and -link or PAX-header entries in TAR archives always had the current time as last modfication time, creating archives that are different at the byte level each time an archive was built. Issue: COMPRESS-289. Thanks to Bob Robertson. Changes: o Checking for XZ for Java may be expensive. The result will now be cached outside of an OSGi environment. You can use the new XZUtils#setCacheXZAvailability to overrride this default behavior. Issue: COMPRESS-285. Release 1.8.1 ------------- New features: o COMPRESS-272: CompressorStreamFactory can now auto-detect Unix compress (".Z") streams. Fixed Bugs: o COMPRESS-270: The snappy, ar and tar inputstreams might fail to read from a non-buffered stream in certain cases. o COMPRESS-277: IOUtils#skip might skip fewer bytes than requested even though more could be read from the stream. o COMPRESS-276: ArchiveStreams now validate there is a current entry before reading or writing entry data. o ArjArchiveInputStream#canReadEntryData tested the current entry of the stream rather than its argument. o COMPRESS-274: ChangeSet#delete and deleteDir now properly deal with unnamed entries. o COMPRESS-273: Added a few null checks to improve robustness. o COMPRESS-278: TarArchiveInputStream failed to read archives with empty gid/uid fields. o COMPRESS-279: TarArchiveInputStream now again throws an exception when it encounters a truncated archive while reading from the last entry. o COMPRESS-280: Adapted TarArchiveInputStream#skip to the modified IOUtils#skip method. Thanks to BELUGA BEHR. Changes: o The dependency on org.tukaani:xz is now marked as optional. Release 1.8 ----------- New features: o GzipCompressorInputStream now provides access to the same metadata that can be provided via GzipParameters when writing a gzip stream. Issue: COMPRESS-260. o SevenZOutputFile now supports chaining multiple compression/encryption/filter methods and passing options to the methods. Issue: COMPRESS-266. o The (compression) method(s) can now be specified per entry in SevenZOutputFile. Issue: COMPRESS-261. o SevenZArchiveEntry "knows" which method(s) have been used to write it to the archive. Issue: COMPRESS-258. o The 7z package now supports the delta filter as method. o The 7z package now supports BCJ filters for several platforms. You will need a version >= 1.5 of XZ for Java to read archives using BCJ, though. Issue: COMPRESS-257. Fixed Bugs: o BZip2CompressorInputStream read fewer bytes than possible from a truncated stream. Issue: COMPRESS-253. o SevenZFile failed claiming the dictionary was too large when archives used LZMA compression for headers and content and certain non-default dictionary sizes. Issue: COMPRESS-253. o CompressorStreamFactory.createCompressorInputStream with explicit compression did not honor decompressConcatenated Issue: COMPRESS-259. o TarArchiveInputStream will now read archives created by tar implementations that encode big numbers by not adding a trailing NUL. Issue: COMPRESS-262. o ZipArchiveInputStream would return NUL bytes for the first 512 bytes of a STORED entry if it was the very first entry of the archive. Issue: COMPRESS-264. o When writing PAX/POSIX headers for TAR entries with backslashes or certain non-ASCII characters in their name TarArchiveOutputStream could fail. Issue: COMPRESS-265. o ArchiveStreamFactory now throws a StreamingNotSupported - a new subclass of ArchiveException - if it is asked to read from or write to a stream and Commons Compress doesn't support streaming for the format. This currently only applies to the 7z format. Issue: COMPRESS-267. Release 1.7 ----------- New features: o Read-Only support for Snappy compression. Issue: COMPRESS-147. Thanks to BELUGA BEHR. o Read-Only support for .Z compressed files. Issue: COMPRESS-243. Thanks to Damjan Jovanovic. o ZipFile and ZipArchiveInputStream now support reading entries compressed using the SHRINKING method. Thanks to Damjan Jovanovic. o GzipCompressorOutputStream now supports setting the compression level and the header metadata (filename, comment, modification time, operating system and extra flags) Issue: COMPRESS-250. Thanks to Emmanuel Bourg. o ZipFile and ZipArchiveInputStream now support reading entries compressed using the IMPLODE method. Issue: COMPRESS-115. Thanks to Emmanuel Bourg. o ZipFile and the 7z file classes now implement Closeable and can be used in try-with-resources constructs. Fixed Bugs: o SevenZOutputFile#closeArchiveEntry throws an exception when using LZMA2 compression on Java8. Issue: COMPRESS-241. o 7z reading of big 64bit values could be wrong. Issue: COMPRESS-244. Thanks to Nico Kruber. o TarArchiveInputStream could fail to read an archive completely. Issue: COMPRESS-245. o The time-setters in X5455_ExtendedTimestamp now set the corresponding flags explicitly - i.e. they set the bit if the valus is not-null and reset it otherwise. This may cause incompatibilities if you use setFlags to unset a bit and later set the time to a non-null value - the flag will now be set. Issue: COMPRESS-242. o SevenZOutputFile would create invalid archives if more than six empty files or directories were included. Issue: COMPRESS-252. Release 1.6 ----------- Version 1.6 introduces changes to the internal API of the tar package that break backwards compatibility in the following rare cases. This version removes the package private TarBuffer class along with the protected "buffer" members in TarArchiveInputStream and TarArchiveOutputStream. This change will only affect you if you have created a subclass of one of the stream classes and accessed the buffer member or directly used the TarBuffer class. Changes in this version include: New features: o Added support for 7z archives. Most compression algorithms can be read and written, LZMA and encryption are only supported when reading. Issue: COMPRESS-54. Thanks to Damjan Jovanovic. o Added read-only support for ARJ archives that don't use compression. Issue: COMPRESS-226. Thanks to Damjan Jovanovic. o DumpArchiveInputStream now supports an encoding parameter that can be used to specify the encoding of file names. o The CPIO streams now support an encoding parameter that can be used to specify the encoding of file names. o Read-only support for LZMA standalone compression has been added. Issue: COMPRESS-111. Fixed Bugs: o TarBuffer.tryToConsumeSecondEOFRecord could throw a NullPointerException Issue: COMPRESS-223. Thanks to Jeremy Gustie. o Parsing of zip64 extra fields has become more lenient in order to be able to read archives created by DotNetZip and maybe other archivers as well. Issue: COMPRESS-228. o TAR will now properly read the names of symbolic links with long names that use the GNU variant to specify the long file name. Issue: COMPRESS-229. Thanks to Christoph Gysin. o ZipFile#getInputStream could return null if the archive contained duplicate entries. The class now also provides two new methods to obtain all entries of a given name rather than just the first one. Issue: COMPRESS-227. o CpioArchiveInputStream failed to read archives created by Redline RPM. Issue: COMPRESS-236. Thanks to Andrew Duffy. o TarArchiveOutputStream now properly handles link names that are too long to fit into a traditional TAR header. Issue: COMPRESS-237. Thanks to Emmanuel Bourg. o The auto-detecting create*InputStream methods of Archive and CompressorStreamFactory could fail to detect the format of blocking input streams. Issue: COMPRESS-239. Changes: o Readabilty patch to TarArchiveInputStream. Issue: COMPRESS-232. Thanks to BELUGA BEHR. o Performance improvements to TarArchiveInputStream, in particular to the skip method. Issue: COMPRESS-234. Thanks to BELUGA BEHR. Release 1.5 ----------- New features: o CompressorStreamFactory has an option to create decompressing streams that decompress the full input for formats that support multiple concatenated streams. Issue: COMPRESS-220. Fixed Bugs: o Typo in CompressorStreamFactory Javadoc Issue: COMPRESS-218. Thanks to Gili. o ArchiveStreamFactory's tar stream detection created false positives for AIFF files. Issue: COMPRESS-191. Thanks to Jukka Zitting. o XZ for Java didn't provide an OSGi bundle. Compress' dependency on it has now been marked optional so Compress itself can still be used in an OSGi context. Issue: COMPRESS-199. Thanks to Jukka Zitting. o When specifying the encoding explicitly TarArchiveOutputStream would write unreadable names in GNU mode or even cause errors in POSIX mode for file names longer than 66 characters. Issue: COMPRESS-200. Thanks to Christian Schlichtherle. o Writing TAR PAX headers failed if the generated entry name ended with a "/". Issue: COMPRESS-203. o ZipArchiveInputStream sometimes failed to provide input to the Inflater when it needed it, leading to reads returning 0. Issue: COMPRESS-189. Thanks to Daniel Lowe. o TarArchiveInputStream ignored the encoding for GNU long name entries. Issue: COMPRESS-212. o TarArchiveInputStream could leave the second EOF record inside the stream it had just finished reading. Issue: COMPRESS-206. Thanks to Peter De Maeyer. o DumpArchiveInputStream no longer implicitly closes the original input stream when it reaches the end of the archive. o ZipArchiveInputStream now consumes the remainder of the archive when getNextZipEntry returns null. o Unit tests could fail if the source tree was checked out to a directory tree containign spaces. Issue: COMPRESS-205. Thanks to Daniel Lowe. o Fixed a potential ArrayIndexOutOfBoundsException when reading STORED entries from ZipArchiveInputStream. Issue: COMPRESS-219. o CompressorStreamFactory can now be used without XZ for Java being available. Issue: COMPRESS-221. Changes: o Improved exception message if a zip archive cannot be read because of an unsupported compression method. Issue: COMPRESS-188. Thanks to Harald Kuhn. o ArchiveStreamFactory has a setting for file name encoding that sets up encoding for ZIP and TAR streams. Issue: COMPRESS-192. Thanks to Jukka Zitting. o TarArchiveEntry now has a method to verify its checksum. Issue: COMPRESS-191. Thanks to Jukka Zitting. o Split/spanned ZIP archives are now properly detected by ArchiveStreamFactory but will cause an UnsupportedZipFeatureException when read. o ZipArchiveInputStream now reads archives that start with a "PK00" signature. Archives with this signatures are created when the archiver was willing to split the archive but in the end only needed a single segment - so didn't split anything. Issue: COMPRESS-208. o TarArchiveEntry has a new constructor that allows setting linkFlag and preserveLeadingSlashes at the same time. Issue: COMPRESS-201. o ChangeSetPerformer has a new perform overload that uses a ZipFile instance as input. Issue: COMPRESS-159. o Garbage collection pressure has been reduced by reusing temporary byte arrays in classes. Issue: COMPRESS-172. Thanks to Thomas Mair. o Can now handle zip extra field 0x5455 - Extended Timestamp. Issue: COMPRESS-210. Thanks to Julius Davies. o handle zip extra field 0x7875 - Info Zip New Unix Extra Field. Issue: COMPRESS-211. Thanks to Julius Davies. o ZipShort, ZipLong, ZipEightByteInteger should implement Serializable Issue: COMPRESS-213. Thanks to Julius Davies. o better support for unix symlinks in ZipFile entries. Issue: COMPRESS-214. Thanks to Julius Davies. o ZipFile's initialization has been improved for non-Zip64 archives. Issue: COMPRESS-215. Thanks to Robin Power. o Updated XZ for Java dependency to 1.2 as this version provides proper OSGi manifest attributes. Release 1.4.1 ------------- This is a security bugfix release, see http://commons.apache.org/proper/commons-compress/security.html#Fixed_in_Apache_Commons_Compress_1.4.1 Fixed Bugs: o Ported libbzip2's fallback sort algorithm to BZip2CompressorOutputStream to speed up compression in certain edge cases. Release 1.4 ----------- New features: o COMPRESS-156: Support for the XZ format has been added. Fixed Bugs: o COMPRESS-183: The tar package now allows the encoding of file names to be specified and can optionally use PAX extension headers to write non-ASCII file names. The stream classes now write (or expect to read) archives that use the platform's native encoding for file names. Apache Commons Compress 1.3 used to strip everything but the lower eight bits of each character which effectively only worked for ASCII and ISO-8859-1 file names. This new default behavior is a breaking change. o COMPRESS-184: TarArchiveInputStream failed to parse PAX headers that contained non-ASCII characters. o COMPRESS-178: TarArchiveInputStream throws IllegalArgumentException instead of IOException o COMPRESS-179: TarUtils.formatLongOctalOrBinaryBytes() assumes the field will be 12 bytes long o COMPRESS-175: GNU Tar sometimes uses binary encoding for UID and GID o COMPRESS-171: ArchiveStreamFactory.createArchiveInputStream would claim short text files were TAR archives. o COMPRESS-164: ZipFile didn't work properly for archives using unicode extra fields rather than UTF-8 filenames and the EFS-Flag. o COMPRESS-169: For corrupt archives ZipFile would throw a RuntimeException in some cases and an IOException in others. It will now consistently throw an IOException. Changes: o COMPRESS-182: The tar package can now write archives that use star/GNU/BSD extensions or use the POSIX/PAX variant to store numeric values that don't fit into the traditional header fields. o COMPRESS-181: Added a workaround for a Bug some tar implementations that add a NUL byte as first byte in numeric header fields. o COMPRESS-176: Added a workaround for a Bug in WinZIP which uses backslashes as path separators in Unicode Extra Fields. o COMPRESS-131: ArrayOutOfBounds while decompressing bz2. Added test case - code already seems to have been fixed. o COMPRESS-146: BZip2CompressorInputStream now optionally supports reading of concatenated .bz2 files. o COMPRESS-154: GZipCompressorInputStream now optionally supports reading of concatenated .gz files. o COMPRESS-16: The tar package can now read archives that use star/GNU/BSD extensions for files that are longer than 8 GByte as well as archives that use the POSIX/PAX variant. o COMPRESS-165: The tar package can now write archives that use star/GNU/BSD extensions for files that are longer than 8 GByte as well as archives that use the POSIX/PAX variant. o COMPRESS-166: The tar package can now use the POSIX/PAX variant for writing entries with names longer than 100 characters. Release 1.3 ----------- Commons Compress 1.3 is the first version to require Java5 at runtime. Changes in this version include: New features: o Support for the Pack200 format has been added. Issue: COMPRESS-142. o Read-only support for the format used by the Unix dump(8) tool has been added. Issue: COMPRESS-132. Fixed Bugs: o BZip2CompressorInputStream's getBytesRead method always returned 0. o ZipArchiveInputStream and ZipArchiveOutputStream could leak resources on some JDKs. Issue: COMPRESS-152. o TarArchiveOutputStream's getBytesWritten method didn't count correctly. Issue: COMPRESS-160. Changes: o The ZIP package now supports Zip64 extensions. Issue: COMPRESS-36. o The AR package now supports the BSD dialect of storing file names longer than 16 chars (both reading and writing). Issue: COMPRESS-144. Release 1.2 ----------- New features: o COMPRESS-123: ZipArchiveEntry has a new method getRawName that provides the original bytes that made up the name. This may allow user code to detect the encoding. o COMPRESS-122: TarArchiveEntry provides access to the flags that determine whether it is an archived symbolic link, pipe or other "uncommon" file system object. Fixed Bugs: o COMPRESS-129: ZipArchiveInputStream could fail with a "Truncated ZIP" error message for entries between 2 GByte and 4 GByte in size. o COMPRESS-145: TarArchiveInputStream now detects sparse entries using the oldgnu format and properly reports it cannot extract their contents. o COMPRESS-130: The Javadoc for ZipArchiveInputStream#skip now matches the implementation, the code has been made more defensive. o COMPRESS-140: ArArchiveInputStream fails if entries contain only blanks for userId or groupId. Thanks to Trejkaz. o COMPRESS-139: ZipFile may leak resources on some JDKs. o COMPRESS-125: BZip2CompressorInputStream throws IOException if underlying stream returns available() == 0. Removed the check. o COMPRESS-127: Calling close() on inputStream returned by CompressorStreamFactory.createCompressorInputStream() does not close the underlying input stream. o COMPRESS-119: TarArchiveOutputStream#finish now writes all buffered data to the stream Changes: o ZipFile now implements finalize which closes the underlying file. o COMPRESS-117: Certain tar files not recognised by ArchiveStreamFactory. Release 1.1 ----------- New features: o COMPRESS-108: Command-line interface to list archive contents. Usage: java -jar commons-compress-n.m.jar archive-name [zip|tar|etc] o COMPRESS-109: Tar implementation does not support Pax headers Added support for reading pax headers. Note: does not support global pax headers o COMPRESS-103: ZipArchiveInputStream can optionally extract data that used the STORED compression method and a data descriptor. Doing so in a stream is not safe in general, so you have to explicitly enable the feature. By default the stream will throw an exception if it encounters such an entry. o COMPRESS-98: The ZIP classes will throw specialized exceptions if any attempt is made to read or write data that uses zip features not supported (yet). o COMPRESS-99: ZipFile#getEntries returns entries in a predictable order - the order they appear inside the central directory. A new method getEntriesInPhysicalOrder returns entries in order of the entry data, i.e. the order ZipArchiveInputStream would see. o The Archive*Stream and ZipFile classes now have can(Read|Write)EntryData methods that can be used to check whether a given entry's data can be read/written. The method currently returns false for ZIP archives if an entry uses an unsupported compression method or encryption. o COMPRESS-89: The ZIP classes now detect encrypted entries. o COMPRESS-97: Added autodetection of compression format to CompressorStreamFactory. o COMPRESS-95: Improve ExceptionMessages in ArchiveStreamFactory Thanks to Joerg Bellmann. o A new constructor of TarArchiveEntry can create entries with names that start with slashes - the default is to strip leading slashes in order to create relative path names. o ArchiveEntry now has a getLastModifiedDate method. o COMPRESS-78: Add a BZip2Utils class modelled after GZipUtils Thanks to Jukka Zitting. Fixed Bugs: o COMPRESS-72: Move acknowledgements from NOTICE to README o COMPRESS-113: TarArchiveEntry.parseTarHeader() includes the trailing space/NUL when parsing the octal size o COMPRESS-118: TarUtils.parseName does not properly handle characters outside the range 0-127 o COMPRESS-107: ArchiveStreamFactory does not recognise tar files created by Ant o COMPRESS-110: Support "ustar" prefix field, which is used when file paths are longer than 100 characters. o COMPRESS-100: ZipArchiveInputStream will throw an exception if it detects an entry that uses a data descriptor for a STORED entry since it cannot reliably find the end of data for this "compression" method. o COMPRESS-101: ZipArchiveInputStream should now properly read archives that use data descriptors but without the "unofficial" signature. o COMPRESS-74: ZipArchiveInputStream failed to update the number of bytes read properly. o ArchiveInputStream has a new method getBytesRead that should be preferred over getCount since the later may truncate the number of bytes read for big archives. o COMPRESS-85: The cpio archives created by CpioArchiveOutputStream couldn't be read by many existing native implementations because the archives contained multiple entries with the same inode/device combinations and weren't padded to a blocksize of 512 bytes. o COMPRESS-73: ZipArchiveEntry, ZipFile and ZipArchiveInputStream are now more lenient when parsing extra fields. o COMPRESS-82: cpio is terribly slow. Documented that buffered streams are needed for performance o Improved exception message if the extra field data in ZIP archives cannot be parsed. o COMPRESS-17: Tar format unspecified - current support documented. o COMPRESS-94: ZipArchiveEntry's equals method was broken for entries created with the String-arg constructor. This lead to broken ZIP archives if two different entries had the same hash code. Thanks to Anon Devs. o COMPRESS-87: ZipArchiveInputStream could repeatedly return 0 on read() when the archive was truncated. Thanks to Antoni Mylka. o COMPRESS-86: Tar archive entries holding the file name for names longer than 100 characters in GNU longfile mode didn't properly specify they'd be using the "oldgnu" extension. o COMPRESS-83: Delegate all read and write methods in GZip stream in order to speed up operations. o The ar and cpio streams now properly read and write last modified times. o COMPRESS-81: TarOutputStream can leave garbage at the end of the archive Changes: o COMPRESS-112: ArArchiveInputStream does not handle GNU extended filename records (//) o COMPRESS-105: Document that the name of an ZipArchiveEntry determines whether an entry is considered a directory or not. If you don't use the constructor with the File argument the entry's name must end in a "/" in order for the entry to be known as a directory. o COMPRESS-79: Move DOS/Java time conversions into Zip utility class. o COMPRESS-75: ZipArchiveInputStream does not show location in file where a problem occurred.