PDFBox 2.0.0 requires at least Java 6
There are some significant changes to the package structure of PDFBox:
All libraries on which PDFBox depends are updated to their latest stable versions:
For test support the libraries are updated to
Most deprecated API calls in PDFBox 1.8.x have been removed for PDFBox 2.0.0
The API changes are reflected in the Javadoc for PDFBox 2.0.0. The most notable changes are:
getCOSDictionary()
is no longer used. Instead getCOSObject
now returns the matching COSBase
subtype.PDXObjectForm
was renamed to PDFormXObject
to be more in line with the PDF specification.PDXObjectImage
was renamed to PDImageXObject
to be more in line with the PDF specification.PDPage.getContents().createInputStream()
was simplified to PDPage.getContents()
.PDFBox 2.0.0 is now parsing PDF files following the Xref information in the PDF. This is similar to the functionality using
PDDocument.loadNonSeq
with PDFBox 1.8.x. Users still using PDDocument.load
with PDFBox 1.8.x might experience different
results when switching to PDFBox 2.0.0.
Font handling now has full Unicode support and supports font subsetting.
TrueType fonts shall now be loaded using
PDType0Font.load
to leverage that.
The individual calls to add resources such as PDResource.addFont(PDFont font)
and PDResource.addXObject(PDXObject xobject, String prefix)
have been replaced with PDResource.add(resource type)
where resource type
represents the different resource classes such as PDFont
, PDAbstractPattern
and so on. The add
method now supports all the different type of resources available.
The individual classes PDJpeg()
, PDPixelMap()
and PDCCitt()
to import images have been replaced with PDImageXObject.createFromFile
which works for JPG, TIFF (only G4 compression), PNG, BMP and GIF.
In addition there are some specialized classes:
JPEGFactory.createFromStream
which preserve the JPEG data and embed it in the PDF file without modification. (This is best if you have a JPEG file).CCITTFactory.createFromFile
(for bitonal TIFF images with G4 compression).LosslessFactory.createFromImage
(this is best if you start with a BufferedImage).With PDFBox 2.0.0 the prefered way to iterate through the pages of a document is
for(PDPage page : document.getPages())
{
... (do something)
}
With PDFBox 2.0.0 PDPage.convertToImage
has been removed. Instead the new PDFRenderer
class shall be used.
PDDocument document = PDDocument.load(new File(pdfFilename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
int pageCounter = 0;
for (PDPage page : document.getPages())
{
pdfRenderer.renderImageWithDPI(pageCounter, 300, ImageType.RGB);
// suffix in filename will be used as the file format
ImageIOUtil.writeImage(bim, pdfFilename + "-" + (pageCounter++) + ".png", 300);
}
document.close();
ImageIOUtil
has been moved into the org.apache.pdfbox.tools.imageio
package.
Important notice when using PDFBox with Java 8
Due to the change of the java color management module towards "LittleCMS", users can experience slow performance in color operations. Solution: disable LittleCMS in favour of the old KCMS (Kodak Color Management System):
-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
or callSystem.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
Sources:
http://www.subshell.com/en/subshell/blog/Wrong-Colors-in-Images-with-Java8-100.html
https://bugs.openjdk.java.net/browse/JDK-8041125
With PDFBox 2.0.0 PDFPrinter
has been removed.
Users of PDFPrinter.silentPrint()
should now use this code:
PrinterJob job = PrinterJob.getPrinterJob();
job.setPageable(new PDFPageable(document));
job.print();
While users of PDFPrinter.print()
should now use this code:
PrinterJob job = PrinterJob.getPrinterJob();
job.setPageable(new PDFPageable(document));
if (job.printDialog()) {
job.print();
}
Advanced use case examples can be found in th examples package under org/apache/pdfbox/examples/printing/Printing.java
In 1.8, to get the text colors, one method was to pass an expanded .properties file to the PDFStripper constructor. To achieve the same
in PDFBox 2.0 you can extend PDFTextStripper
and add the following Operators
to the constructor:
addOperator(new SetStrokingColorSpace());
addOperator(new SetNonStrokingColorSpace());
addOperator(new SetStrokingDeviceCMYKColor());
addOperator(new SetNonStrokingDeviceCMYKColor());
addOperator(new SetNonStrokingDeviceRGBColor());
addOperator(new SetStrokingDeviceRGBColor());
addOperator(new SetNonStrokingDeviceGrayColor());
addOperator(new SetStrokingDeviceGrayColor());
addOperator(new SetStrokingColor());
addOperator(new SetStrokingColorN());
addOperator(new SetNonStrokingColor());
addOperator(new SetNonStrokingColorN());
Large parts of the support for interactive forms (AcroForms) has been rewritten. The most notable change from 1.8.x is that
there is a clear distinction between fields and the annotations representing them visually. Intermediate nodes in a field
tree are now represented by the PDNonTerminalField
class.
With PDFBox 2.0.0 the prefered way to iterate through the fields is now
PDAcroForm form;
...
for (PDField field : form.getFieldTree())
{
... (do something)
}
Most PDField
subclasses now accept Java generic types such as String
as parameters instead of the former COSBase
subclasses.