=============================== Requirements for PDF generation =============================== Generation of PDF documents should be added to the document component. Those PDF documents should be created from Docbook documents, which can be generated from each markup available in the document component. This document summarizes the requirements for PDF generation. Central requirements ==================== The key requirements for the component Layout ------ The central requirement is to generate user styles PDF documents from Docbook markup. The customized styling should include, but is not limited to: - Text - Fonts and text sizes - Line heights - Colors - Alignments - Pages - Footer and headers - Background images - Multiple text columns per page - Page sizes - Margins and paddings - Block level elements (tables, graphics, literal blocks, ...) - Borders, backgrounds, fonts, colors It must be possible to assign different styles depending on the parent elements in the Docbook markup, so that the titles in the following Docbook document can be formatted differently::
Article title
First heading
Second heading
It would be nice if the styles can be imported and exported from / to a easily readable and writeable format. Text formatting --------------- Proper formatting of texts is most probably the biggest problem in the implementation, the requirements include: Hyphenation ^^^^^^^^^^^ Especially justified texts in narrow text columns requires hyphenation for words, otherwise the blanks between characters and words might increase to much. A pluggable hyphenation mechanism is required, which can be adapted to different languages, based on externally available dictionaries. Widows and orphans ^^^^^^^^^^^^^^^^^^ See: http://en.wikipedia.org/wiki/Widows_and_orphans There should be ways to configure the thresholds under which paragraphs are considered widows or orphans, which should be avoided. Inline formatting ^^^^^^^^^^^^^^^^^ Depending on the used font and styles inline formatting might have a serious effect on the text width. This MUST be respecting during text rendering. LTF and RTL languages ^^^^^^^^^^^^^^^^^^^^^ The text wrapping must be able to work with left-to-right and right-to-left languages. Floating media objects ^^^^^^^^^^^^^^^^^^^^^^ For media objects, which do not span the whole column width, it should be possible to float text around the media objects. Detection of the actual image borders is not required - the rectangular frame around the image should be sufficant for text floating. Embedding of media ------------------ There are a lot of different media types, which might be embedded into PDF: The most common format seem to be JPEG and EPS. JPEG is not suitable for several types of graphics [1], and EPS can only be used properly for some types of vector based images. Conversion options and supported formats must be evaluated. It might depend on the used driver which formats are supported. PDI allows embedding of other PDFs inside the created PDF - this can be useful when merging different generated documents. .. [1] http://kore-nordmann.de/blog/image_formats.html Metadata -------- The document component already preserves metadata associated with documents. PDF supports embedding additional document metadata. This should definitely be embedded, but it might also be useful to offer a easy accessible API for embedding of additional metadata. XMP is especially designed to embed metadata using the RDF. Autogenerated contents ---------------------- Headers and footers often contain some fixed texts, but might also contain autogenerated contents, like: - Current page / number of pages / page orientation (left, right) - Current section title - Author, read from document metadata It must be possible to define callbacks which generate those contents for the page they are currently rendered on. The best possible markup used for generation of those contents needs to be evaluated. There are several elements, which can require automatic generation, those are at least: - Header / Footer - Cover page - Table of contents - Back page For most of those elements a predefined generator can be implemented which creates meaningful default contents, and then can be extended by the user. Especially for cover and back pages it might be useful to include them directly from other PDF documents. Driver infrastructure --------------------- There are multiple ways to generate PDF documents, like: - pecl/libharu - FPDF - TCPDF - pdflib - Zend_PDF It might depend on the environment which one of those libraries is available and performs the best. A driver infrastructure should offer the user the choice of selecting the best output driver for writing the actual PDF. Not all of those drivers do support proper text wrapping themselves, so that this cannot be handed over to the drivers. Optional requirements ===================== Once PDF rendering is implemented correctly, including correct rendering and wrapping of texts, it might be useful in similar cases, for example: - SVG to PDF conversion The conversion of SVG to PDF is used for distribution of heavily customized designed documents. With a proper rendering infrastructure the API should be kept flexible enough to support such conversions later - HTML to PDF conversion It might be useful to directly convert styled HTML to PDF - if the API stays flexible enough this should be possible to add later. One major problem might be the used markup for formatting of inline text elements. Import of PDF pages ------------------- For cover pages (or similar) of documents it might be useful to extract whole pages from other PDFs and embed them in the generated PDF document. This requires reading of existing PDF documents, though - which is not planned to be implemented yet. Signing PDFs / write protection ------------------------------- It is common to make PDF documents write protected or sign PDF documents. If the respective PDF creation library can handle that, it should be exposed in the API of the PDF creation.