Module Format and Transformation - The Apache HTTP Server Project

This document describes the format of the Apache HTTP Server documentation xml source files and the technique used to transform them to html.

Format

A DTD is located in the style directory of the manual. An example of the format with extensive comments is also available in mod_template.txt. Obviously, the file extension should be xml. It was changed to make online viewing simpler.

To assure that your documentation follows the defined format, you should parse it using the DTD. Some help using Emacs with XML files is available from IBM developerWorks.

Transformation

The easiest way to view the tranformed docs is simply to open the xml file directly in a recent verions of MSIE, Netscape, or Mozilla. (MSIE 6 seems to work consistently. Some people have had luck with Netscape 6+ and Mozilla, but others have not.) These browsers will read the xsl file and perform the transformation for you automatically, so you can see what the final output will look like. This means that you can work on the docs and check your work without any special transformation setup.

For the final presentation, it is still necessary to transform to html to accomodate older browsers. Although any standards-compliant xslt engine should do, changing engines can lead to massive diffs on the transformed files. Therefore, we have chosen a single recommended transformation system based on Xalan+Xerces Java and Ant. These are all Apache projects distributed under the Apache license.

The only requirement to do the transformation is a Java 1.2 or greater JVM (which can be obtained free from Sun). Assuming you already have httpd-2.0/docs/manual checked out from CVS, here is what you need to do to build: (The build tools are already versioned in a SVN repository; if you need instructions on setting up SVN, see this page.)

$ cd httpd-2.0/docs/manual $ svn co http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/httpd-docs-build build $ cd build $ ./build.sh

If you are running under win32, the build.sh script will work if cygwin is installed. Alternatively, on Win32, you should be able to run the build.bat script.

If you don't want to get the files from SVN, you can download a pkzipped version of the current build tools from our distribution directory.

The default target builds only the english-language docs. To build other docs, you should specify the language-code (ja, de, etc) as an argument. To find the available languages, please see our translations page.

You can get an overview of all possible build targets by typing:

./build.sh -projecthelp

Special Files

When adding a new module, the transformation process tries to generate an appropriate entry within mod/allmodules.xml and to create an accompanying metafile (newfilename.xml.meta). Since these tasks are written in perl, you'll need a working perl installation for this. If not, you should take these steps manually or just drop a note onto the project mailing list that someone else can do it.

Generating a PDF version

The PDF version of the docs is generated by transforming the xml files to LaTeX using the "latex-en" Ant target. The XSLT style files for the transformation are under style/latex/. Once you have the .tex equivalent of each .xml file, you can use pdflatex to convert this into a pdf file. Recommended versions of pdflatex can be obtained as part of TeTeX (unix) or MikTeX (win32), but any version of TeX should be fine, as long as it is sufficiently complete and modern. To generate the PDF, you should process the sitemap.tex file, which contains the main LaTeX document code and will include all the other files. The outputed PDF will then be called sitemap.pdf, which you can rename how you choose.

Some notes about the XML to LaTeX conversion are necessary. Although HTML and LaTeX have many similarities, there are enough differences between the two to make targeting both outputs a difficult proposition. In particular, the method of handling tables is very different. To aid LaTeX in understanding tables designed for HTML, a <columnspec> section should be added to each table. Inside the <columnspec>, place a <column width=".xx"/> for each column in the table, where xx is the percentage of the line-width devoted to that column. This will let the conversion handle basic tables. More complex stuff (like spanning rows or columns) will not work.

In addition, pdflatex does not know how to incorporate GIF files. So any graphics must be available in PNG format.

There are various other restrictive assumptions embedded in the XSLT that work for the current docs, but may need to be modified in the future. For example, the code that transforms HTML-style links to LaTeX cross-references will work only with the main directory and one level of subdirectory. Also, <pre> sections are very likely not to work well in LaTeX because of differences in escaping and formatting rules in verbatim sections.

Finally, there are various differences in escaping rules between XML/HTML and LaTeX. Some characters need to be backslash-escaped in LaTeX, and all XML entitites (&whatever;) must be converted to LaTeX equivalents. This is currently handled for a limitted set of characters using a big ugly search-replace in the XSLT. But this may need to be modified in the future, especially to handle translations. Perhaps pre-processing with a perl script and a substitution table would be better.