Release Notes -- Apache PDFBox -- Version 1.3.1 Introduction ------------ Apache PDFBox is an open source Java library for working with PDF documents. PDFBox 1.3 is an incremental feature release based on earlier releases. This release contains many improvements and fixes especially related to handling of fonts, colors and malformed PDFs. Another notable change is the inclusion of updated CMap files released by Adobe under a BSD license. PDFBox 1.3.1 fixes a memory use regression detected in the 1.3.0 release candidate. This release also contains a fix for the handling of indexed images in encrypted PDFs. For more details on these changes and all the other fixes and improvements included in this release, please refer to the following issues on the PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX. New Features [PDFBOX-11] CID to Unicode mapping [PDFBOX-192] Find encodings in FontFile3 - CompactFont Format [PDFBOX-777] Add utility class to easily extract a range of pages from a PDF [PDFBOX-791] PDFToImage : add the ability to select the area to export ... [PDFBOX-851] Add WriteDecodedDoc to standalone app Improvements [PDFBOX-494] Addtional CMap files from adobe [PDFBOX-554] Handle JPEG2000 images via JPXDecode filter [PDFBOX-592] please accommodate '-' where a number is expected [PDFBOX-704] Implementation of additional CMAP Formats for TrueType fonts [PDFBOX-764] Access to metadata keys in the PD model [PDFBOX-769] Update/adjust used junit version [PDFBOX-782] Update/reactivate ant build [PDFBOX-796] Objects from streams overwrite objects already read with .. [PDFBOX-798] Better handle out of spec PDFs [PDFBOX-799] Add ability to ignore errors with AcroForms [PDFBOX-801] PDPixelMap is too verbose [PDFBOX-802] Better handle corrupt/missing %%EOF flags at the end of a file [PDFBOX-803] Improved handling erronous data between endstream and ... [PDFBOX-812] Remove dependency on PageDrawer from text only operators [PDFBOX-820] Support TIFF predictor 2 with FlateDecode, patch included [PDFBOX-826] Increase performance of ColorSpaceCMYK.toRGB, patch attached Bug Fixes [PDFBOX-58] Problems with text extraction form Polish documents. [PDFBOX-99] Indexed color images have wrong colors after encryption [PDFBOX-122] Exception in text extraction [PDFBOX-257] PDFMergerUtility may create non-unique AcroForm field names [PDFBOX-265] Somtimes, TextPosition have incorrect value .. [PDFBOX-291] Text Extraction strips 1 char when extracting a twin pair [PDFBOX-439] Incorrect text for Exolab.pdf in Regression Test [PDFBOX-440] Improper text produced depending on font for ... [PDFBOX-506] PDFBox can't parse PDF documents from jstor.org [PDFBOX-568] testextract failure on Linux and Mac OS X [PDFBOX-667] Last characters in a line overlap when a PDF is printed [PDFBOX-686] Invalid text rendering while printing a PDF [PDFBOX-772] Re-setting filled properties of PDDocumentInformation do ... [PDFBOX-780] EXCEPTION_ACCESS_VIOLATION in fontmanager.so/fontmanager.dll [PDFBOX-786] PDChoiceField's implementation of SetValue does not work ... [PDFBOX-787] CMap parser doesn't work for double byte mappings with ... [PDFBOX-788] PrintPDF does not take the windows default printer ... [PDFBOX-789] Error by text extraction [PDFBOX-790] Text extraction from PDF generated from MS Word fails [PDFBOX-793] scratchfile ignored in PDDocument load( File file, ... [PDFBOX-805] Extratced ascii text in CJK document is malformed [PDFBOX-808] PDTrueTypeFont.loadTTF() freezes (at TTFDataStream.java:195) [PDFBOX-810] Problem in extracting roman page numbers [PDPageLabels.java] [PDFBOX-813] ClassCastException: COSInteger cannot be cast to COSDictionary [PDFBOX-815] PDFont.getEncodingManager is not thread safe; FIX included [PDFBOX-822] Wrong handling of PNG predictors with FlateDecode, patch ... [PDFBOX-825] Wrong opacity for images with indexed color space [PDFBOX-828] Spaces dissapear and text is shifted left [PDFBOX-834] IIOException: Error 2 when displaying PDF containing CCITT ... [PDFBOX-836] Write2File Fails for PDCalRGB [PDFBOX-839] Use COSName constant instead of COSString [PDFBOX-840] Umlauts font size calculation problem [PDFBOX-841] [pdfbox-app] maven-bundle-configuration problem [PDFBOX-842] Documentation: prominent example has out-of-date class name [PDFBOX-844] AFM-files aren't loaded [PDFBOX-846] TextExtraction mixes case of text [PDFBOX-848] PageDrawer does not take the full CropBox into account [PDFBOX-857] Define a standard encoding for the standard 14 fonts [PDFBOX-866] Indexed images are sometimes corrupted when encrypting the PDF [PDFBOX-874] OutOfMemoryError in text extraction tests Release Contents ---------------- This release consists of a single source archive packaged as a zip file. The archive can be unpacked with the jar tool from your JDK installation. See the README.txt file for instructions on how to build this release. The source archive is accompanied by SHA1 and MD5 checksums and a PGP signature that you can use to verify the authenticity of your download. The public key used for the PGP signature can be found at https://svn.apache.org/repos/asf/pdfbox/KEYS. About Apache PDFBox ------------------- Apache PDFBox is an open source Java library for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities. Apache PDFBox is published under the Apache License, Version 2.0. For more information, visit http://pdfbox.apache.org/ About The Apache Software Foundation ------------------------------------ Established in 1999, The Apache Software Foundation provides organizational, legal, and financial support for more than 100 freely-available, collaboratively-developed Open Source projects. The pragmatic Apache License enables individual and commercial users to easily deploy Apache software; the Foundation's intellectual property framework limits the legal exposure of its 2,500+ contributors. For more information, visit http://www.apache.org/