Title: ASF Content Management System
Notice:    Licensed to the Apache Software Foundation (ASF) under one
           or more contributor license agreements.  See the NOTICE file
           distributed with this work for additional information
           regarding copyright ownership.  The ASF licenses this file
           to you under the Apache License, Version 2.0 (the
           "License"); you may not use this file except in compliance
           with the License.  You may obtain a copy of the License at
           .
             http://www.apache.org/licenses/LICENSE-2.0
           .
           Unless required by applicable law or agreed to in writing,
           software distributed under the License is distributed on an
           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
           KIND, either express or implied.  See the License for the
           specific language governing permissions and limitations
           under the License.

[TOC]

# Usage. # {#usage}

For complete details, please see [Updating the Infrastructure web site](infra-site.html) and
[Reference Manual](cmsref.html).  A video tutorial is available at <http://s.apache.org/cms-tutorial>.

If you just want to get started editing a page:

* Install the bookmarklet from the [cms](https://cms.apache.org/#bookmark) page.
  You only have to do this once.
* Navigate to the page you wish to edit (on the live site, not in the cms).
* Click the bookmarklet.
  There will be a short pause while the CMS system is initialised for you.
* Click on `Edit` (to skip this step hack the bookmarklet to add an 'action=edit'
  param to the bookmarklet's query string)
* The page editor should then be displayed.
* Click `Submit` to save your edit to the workarea
* Click `Commit` to save the updated file to SVN and trigger a staged build. (to
  skip this step click on the "Quick Commit" checkbox in the `Edit` form).
* The results should appear shortly on the
  [staging](http://www.staging.apache.org/dev/cms.html) site.
  (You may have to force the page to refresh in order to see the updated content)
* Once you are happy with the updated page, click on `Publish Site` to deploy.

-----

# Rationale. # {#rationale}

This section describes the current conditions of the ASF website publishing
system and its deficiencies.  It also discusses options the Infrastructure
Team considered in addressing these problems with an eye towards our future
needs.

## Problems with the Current Website Management Tools. ## {#current-problems}


### Scheduled find + sync Doesn't Scale. ### {#does-not-scale}

The existing publishing system at Apache has evolved from the case where
the organization's hardware consisted of a single machine.  Websites have
always been limited to using a combination of static content and cgi
scripts in order to not overtax a machine simultaneously responsible for
delivering (circa 2000-2003) over 1M hits and serving committers as our
CVS master host.

The organization has since grown to encompass about three full cabinets
worth of hardware and a pair of machines dedicated to serving mainly
www.apache.org and project websites.  The machines, eos and aurora, are 
some of our most expensive equipment and are located in two different
datacenters to provide redundancy and failover capabilities.  The
current traffic load is roughly 20M hits a day for those machines.

However the publishing system involves running hourly `find` jobs on
people.apache.org and pushing that content out to eos and aurora with
`rsync`.  With roughly 300GB worth of content to scan it is no longer
possible to do this with a single `find` job, so we now run them in
parallel: one find job per website.  This puts an incredible load on
people.apache.org's ZFS array as there are roughly 100 sites to scan.
As good as ZFS is, the filesystem will not be able to keep up with this
load as the organization continues to promote new top-level projects.

### Limitations of Confluence's Shared Plugin Architecture. ### {#confluence-limitations}

Several years ago during the wiki craze at Apache, the Infrastructure
Team was tasked with setting up a Confluence installation for our projects
to use.  Apache member Pier Fumigalli developed and offered the autoexport
plugin as a way to provide Confluence-backed project websites, which was
quickly adopted by several projects.  The process involves rsyncing the
autoexported pages from the machine hosting Confluence over to
people.apache.org, where the standard publication system described above
would push those pages out to eos and aurora to be served live.

Over time we began to experience chronic problems with this particular
setup.  First off, different projects often wanted to use different and
occasionally conflicting plugins for their sites.  Secondly, plugins would
often break during Confluence upgrades.  The biggest offender was in fact
the autoexport plugin and its reliance on Confluence internals.  Virtually
every upgrade was guaranteed to break it, and after a while Pier and other
java developers at Apache lost interest in supporting it.  We tried
around for people to support it, and were even willing to compensate folks
for their time, but there were no takers.  Confluence backed websites were
fully dependent on the autoexport plugin to have any chance of working,
and the organization was caught between a rock and a hard place in deciding
when it was possible to upgrade Confluence.

The other main problem with this configuration is that it makes url
deletions a nightmare.  The autoexport plugin doesn't support url
deletions, and that is carried through to the live sites via rsync.

Currently Apache's Confluence installation is hosted on thor, which is
a Sun T5220 Sparc.  It's by far our beefiest machine with 8 cores and 8
threads per core, and yet our Confluence service is dog slow.  Our 
installation is simply out-scaling the software, and to keep it performing 
acceptably will require even more significant equipment investments going 
forward.

### Anakia Is Outdated. ### {#anakia-outdated}

[Anakia](http://velocity.apache.org/anakia/devel/) was a great tool 10
years ago.  It is a competing technology to XSLT for dealing with raw
XML content.  Many projects still rely on anakia to generate their webpages
but most of the web has moved on.  It's time the ASF caught up with the
times.

### Not Every Content Author Is a Geek. ### {#non-geeks}

While Apache is still primarily a place for software developers to
collaborate,  some of the people who provide support for our press
and legal efforts need to be able to contribute to www.apache.org.
Expecting them to deal with tools like Anakia to roll their own builds
of XML-based content is a non-starter.

### Publishing Delays Suck. ### {#delay}

Obviously with hourly crons pushing content out to our webservers there
will be delays as long as 2 hours between the time someone commits a change
and logs onto people.apache.org to `svn up` the website, and the time it
actually gets synced to the live site.  That has been the status quo at
Apache for several years and it *simply isn't good enough* any longer.

## Problems with Existing CMS's. ## {#existing-cms}

While there is a zoo of available Open Source CMS's to choose from, only
a handful of them actually support exports of static content.  Even fewer
of them offer support for staging.  Apache's project websites aren't
like Twitter, they don't have rapidly changing content that needs to be
updated and delivered in real-time.  The sites are meant to provide 
**stable resources** for the public to gain necessary information about the 
software we develop.

### Day's CQ5. ### {#cq5}

While not an open source offering, Roy T. Fielding pursued a CQ5 
installation for the organization's use.  Roy demoed the featureset
at ApacheCon US 2009 and the members of the Infrastructure Team who saw
it were thoroughly impressed.  It seemingly met all of our core 
requirements.

However conditions changed in 2010 for Roy, and he simply lost any
free time he could have put to this effort.  We had to eliminate this
as an option going forward, but thank Roy and Day for their time and
consideration.

### Adoption and Diversity. ### {#lenya}

[Lenya](http://lenya.apache.org/) had most of the features we were looking
for, but ultimately was rejected as being insufficiently flexible for use
as a foundation-wide CMS.  Allowing projects the flexibility of deploying
per-project site build technologies which were only limited by the
software installation on the build host was the Infrastructure Team's
preferred strategy.

-----

# Custom Solution. # {#solution}

In September 2010 Philip Gollucci, VP Infrastructure, gave the green light
to a custom-built CMS for the ASF, to be developed primarily by one of the
contracted System Administrators.  After collecting feedback on the goals
and requirements of several interested parties, the development work
was undertaken with a goal of completing the work in 60 days or less-
just in time for ApacheCon 2010 NA.  Fortunately the goals were kept simple
enough that the actual development time only spanned about 30 days.

## Unix Paradigm. ## {#separation-of-concerns}

The software follows the Unix development mantra of separate executables
for independent activities.  The key separation was to ensure content
presentation was kept independent from content editing, using the
addressability of the web to sew things together.  The main advantage of
this approach is that it imposes relatively few constraints on the content
generation software- different projects may adopt different tools to
build their websites, without any of the conflicts inherent in 
single-process plugin architectures like Confluence.

## Flexible Templating and Site Generation. ## {#templating}

While `Dotiac::DTL`, a perl port of django's templating library, was chosen
for use with www.apache.org, it is not a requirement that projects adopt
it.  Any templating system that runs on FreeBSD may be used, provided
the necessary (perl) glue code is written that makes the system compatible
with the CMS's build system.

## Automated Parallel Builds. ## {#builds}

The CMS relies on [buildbot](http://ci.apache.org/buildbot.html) to provide 
automated builds and checkins of a project's staging site.  Such builds are 
triggered instantly on commits to the project's site source material and
are an essential component of the system.

The build system executes builds in parallel, so it is quite fast, even for
a full site build.

## Markdown Recommended. ## {#markdown}

[Markdown](http://daringfireball.net/projects/markdown/) was chosen as the
format for the www.apache.org source content.  Editing the source in the
CMS's webgui relies on the [wmd-editor](http://www.wmd-editor.com/) to
provide a WYSIWYM look and feel to the CMS.

Although it is *strongly recommended* that projects migrating to the CMS
adopt markdown, it is not a hard requirement.  In fact the 
[codemirror](http://codemirror.net/) is also provided as an option for those
who prefer to store their source content in raw html.

## Django Influences. ## {#django}

The CMS's overall design was influenced heavily by 
[django's](http://www.djangoproject.com/) architecture.  From the build
system to the preferred template system to the webgui, the influences are
clear and obvious to anyone familiar with django.

## Subversion as Data Store. ## {#subversion}

Instead of developing versioning support and a notification scheme into a
database driven CMS, Apache's [subversion 
infrastructure](http://svn.apache.org/) was chosen as the central data
store for everything.  The fact that the web interface to the CMS
interacts with the subversion repository in a LAN environment,
combined with the lightning-fast SSDs that serve as l2arc cache for the
underlying FreeBSD ZFS filesystem, eliminates virtually all subversion
network/disk latency.  Subversion continues to scale past 1M commits to
deliver high performance to Apache developers, as well as to our internal
programs that rely on it.

## mod_perl Based Webgui. ## {#mod-perl}

The [mod_perl](http://perl.apache.org/) based webgui is under 3000 LOC
and takes full advantage of the [httpd](http://httpd.apache.org/)
module API.  Being an in-process application it is respectably fast
and will scale well even on the limited hardware (a FreeBSD jail) that
it runs on.

The application embraces the REST architectural style while making
appropriate use of cookies solely to enhance the user experience.
It is also LDAP enabled, not **another auth silo** to deal with, so your
svn committer credentials will instantly grant access to the site.

It was also designed for humans already familiar with the featureset
of the svn command-line tool, taking cues from the Emacs `svn.el` module.
However it is accessible even to those without any familiarity with
`svn`- a simple [javascript
bookmarklet](https://cms.apache.org/#bookmark) allows users to go
from a live webpage to a WYSIWYM editor session in 2 clicks. Submitting,
committing, and publishing those changes is just as simple and
straightforward. You may access the CMS [anonymously](cmsref.html#non-committer)
if you are not currently an Apache committer.

Because the webgui revolves around providing users with a temporary
server-side working copy, the urls it generates are not meant to be
bookmarked, and are forbidden from being shared with others.  The fulcrum
for sharing changes is the staging site, and the "commits are easy and
cheap" concept is built into the webgui.

However the url for publishing a website may be considered appropriate
for writing a basic web service client app.  Since the site is based in
subversion developers may check-out the site and commit directly from
their workstations instead of through the webgui, so it may be convenient
for project members to have a simple site publication script.  This choice
is entirely up to each project, and a reference implementation is available
at <http://s.apache.org/cms-cli>.  Virtually every resource on the site may
be directed to be served as `application/json` simply by adding `as_json=1`
to the query string, or by setting `application/json` as being preferable to
`text/html` in the "Accept" request header.

## ZFS. ## {#zfs}

In order to scale effectively to handling multi-gigabyte size websites, the
webgui relies on zfs clones to create per-user working copies.  The alternative
algorithm would be to physically copy (with say `rsync` or `cp -R`) working copy
trees, but such algorithms are O(N) whereas a zfs clone (essentially a copy-on-write
version of the original) is O(1).

## Svnpubsub. ## {#svnpubsub}

[Svnpubsub](http://svn.apache.org/repos/asf/subversion/trunk/tools/server-side/svnpubsub)
was developed by Paul Querna to provide an infrastructure for
distributing change notifications to our frontline webservers (eos and
aurora).  This system is used by the CMS to convert site publication
requests into live publications, and will someday eventually supplant
the existing `find + rsync` architecture for site publication.  It
is a key component of Apache's infrastructure and will continue to be
promoted going forward, even for those projects who elect not to use
the CMS.

## Scheduled Deployments of Dynamic Content. ## {#dynamic-content}

Despite the above remarks, there is still room for supporting the
generation of "dynamic" content, in the same fashion that [Planet
Apache](http://planet.apache.org/committers/) works.  Namely buildbot
may be setup to run periodic builds of select urls that have dynamic
content, and to subsequently publish the results of those builds.  While
it is possible to run these jobs more frequently than once an hour,
it is not recommended due to the ensuing email notification traffic
generated thereby.

## Separate ACL's for Committing Source Versus Publication. ## {#acl}

Since the CMS relies on separate sections of svn for original content
and staging versus publication, it is possible to configure more relaxed
ACLs for content authors versus those capable of publication.  The
Infrastructure Team recommends that the content on www.apache.org be
editable by the full committership, while publication remains restricted
to members, committers with apsite karma, and members of the Infrastructure
Team.

-----

# Adoption Constraints. # {#constraints}

This section lists the requirements for projects electing to adopt the CMS.

## Layout. ## {#layout}

The original source tree **MUST** have the following layout:

        :::text
        trunk/
           content/                (location of actual site content)
           content/$project_name/  (for the content of incubating projects only)
           lib/
              path.pm              (the analog of django's url.py)
              view.pm              (the corresponding views)
           cgi-bin/                (optional cgi directory)
           templates/              (location of site-wide templates)
        branches/                  (optional branches, currently unused)


## Content. ## {#source}

The source content **MUST** have a unique file extension for each
generated file. I.e. you cannot generate `foo.pdf` and `foo.html` from
the same source file living in the same directory.  You must disambiguate
the paths to these resources using copies or svn externals (symlinks are
not supported, sorry).

There is a further restriction in that the webgui and build system treat
`foo.page/` directories as attachment directories.  This convention
prevents any files contained therein to be built, but may be treated as
content components (eg html snippets and images) for an individual webpage.

Moreover the source files MUST be utf8- no exceptions.

## Build. ## {#build}

The build system is under 1500 LOC and relies on `lib/path.pm` to provide a specially formatted
`@patterns` array to give the build system hints on which *view* to run for
a given source file.  The patterns are checked in order, and if none of
the patterns match, the source file will simply be copied over to the build
tree.  Each element of the `@patterns` array is an arrayref which consists of
3 items: the pattern to test, the name of the *view* function to call, and a
hashref of named parameters to pass (by value) to the *view* function.  The
patterns are tested against files based on their location rooted within the
`content/` subdirectory.

`lib/path.pm` may also provide a hash `%dependencies` mapping paths to array
refs.  The keys lists names of files which will also be rebuilt whenever a
file matching a value has changed.
(This is typically used for sitemaps.)
The filenames in the values and also
listed in the keys are rooted in the `content/` subdirectory.  The
dependency calculation is transitive.

The build system also requires the *view* functions in `lib/view.pm` to
return 2 values, the first being the generated content, and the second
being the new file extension.

The build system will always take the local path to `trunk/` as
the current working directory for the build (branches are currently
unsupported).

Changes to either the `templates/` or `lib/` subdirs will trigger a full site
build.

A detailed [walkthrough](cmsref.html#walkthrough) is available for folks working on site design.

### External Builds. ### {#external}

With the introduction of `svn` 1.7+ working copies, it becomes possible to plug in
a wide variety of functionally similar build systems to the standard perl system
described above- think `maven`, `ant`, `forrest`, etc.  If this interests you please
discuss the matter further on the `infrastructure@` mailing list.  It is not unfair
to describe this CMS as simply a CI tool with a basic web browser interface.

-----

# Future Plans. # {#future}

This section describes the future plans for Apache Infrastructure as it
relates to website publication.

## The Incubator. ## {#incubator}

After going live with www.apache.org, the next project we would like to
tackle is the incubator website.  It too is based on anakia, but thanks
to Sam Ruby there is an [xslt](https://svn.apache.org/repos/infra/websites/cms/conversion-utilities/anakia2markdown.xslt)
file available to help automate the conversion from xdoc to markdown sources.
We would like to complete this migration by March 1, 2011.

## Anakia Based Sites. ## {#anakia-phaseout}

After migrating the incubator site we will branch out to approach any
Apache project still using anakia to convert to the CMS.  This will of
course be a project decision, but we hope the advantages of migration
will be clear and well appreciated by pmc members.  We hope to complete
this process during the summer of 2011. *Update:* see [ant adoption](cmsadoption.html#ant)
for new options for projects still stuck on Anakia.

## Phasing Out Confluence as a CMS. ## {#confluence-phaseout}

The next long-term project to tackle is the eventual phaseout of Confluence
backed websites.  This will be an extensive project which will require
development of content conversion tools, but the clock is ticking on how
long we can continue to run Confluence without any support for the
autoexport plugin.  *Update:* see [confluence adoption](cmsadoption.html#confluence)
for new options for projects still stuck on Confluence.

## Phasing Out people.apache.org as a Publication Hub. ## {#minotaur}

The final long-term objective is to completely eliminate people.apache.org
as the publication hub for Apache websites.  Security considerations alone
make this a worthwhile goal, and to make this happen we would like to
mandate the adoption of at least svnpubsub for all projects by the end of
2012.

## View the ASF CMS code. ## {#cmsinsvn}

As of 1 Nov 2010, this ASF CMS system is now running the main www.apache.org site.

The code for the CMS itself is being developed by the Infrastructure Team,
and you can follow its [Subversion repository](https://svn.apache.org/repos/infra/websites/cms).

We are considering turning the CMS into a proper Apache project starting with
an incubator podling.  If this interests, you please contact `infrastructure-dev@apache.org`
and sign up!