Clustering
Introduction
Apache Lenya 2.1 can be run in a clustered server environment. This setup has several advantages:
- Better performance – Use multiple machines to deliver content.
- Higher availability – If one servlet container is down, the others still serve content.
- Hot-deployment – Switch off and update single cluster nodes while the others still serve content.
There are a couple of prerequisites and restrictions connected with the clustered setup:
- Only one cluster node writes to the repository. This means that the authoring environment is only accessible on one dedicated cluster node, which is called the master node. All other nodes only serve content, they don't modify content in the repository. These nodes are called slave nodes. If your publications contain write functionality in the live area (e.g., user-generated content like comments), you have to make sure that the master node handles the corresponding requests.
- All cluster nodes have access to the shared content repository, or the repository has to be synchronized from the master node to the slave nodes (e.g. via rsync, see below).
- The access control data are currently not synchronized with the slave cluster nodes. If you make changes to access control data (users, groups, policies) which affect the access control of the slave nodes (i.e., permissions within the live area), you have to restart all slave nodes.
Setup
The following steps are necessary to enable clustering in your Lenya installation:
- Configure a shared content directory for all cluster nodes
- Configure a shared Lucene index for all cluster nodes
- Configure a shared directory for access control data
Shared content
All cluster nodes have to use the same directory for content storage. You can configure
the location of the content directory in {publication}/config/publication.xml
:
<content-dir src="/var/lenya/mypub/content"/>
If the cluster nodes run on different machines, you can use e.g. NFS to share the file system on the cluster machines.
Shared Lucene index
The Lucene index is configured in {publication}/config/search/lucene_index.xml
.
You have to configure shared directories for all index files:
<indexes> <index id="mypub-authoring" … directory="/var/lenya/mypub/lucene/index/authoring/index"> … <index id="mypub-live" … directory="/var/lenya/mypub/lucene/index/live/index"> … <index id="mypub-archive" … directory="/var/lenya/mypub/lucene/index/archive/index"> … <index id="mypub-trash" … directory="/var/lenya/mypub/lucene/index/trash/index"> … </indexes>
Shared access control data
The access control data locations are configured in {publication}/config/access-control/access-control.xml
.
Use shared locations for all directories.
<access-controller type="bypassable"> <accreditable-manager type="file"> <parameter name="directory" value="/var/lenya/mypub/access-control/passwd"/> … </accreditable-manager> <policy-manager type="document"> <policy-manager type="file"> <parameter name="directory" value="/var/lenya/mypub/access-control/policies"/> </policy-manager> </policy-manager> <authorizer type="usecase"/> </access-controller>
Using rsync to copy changed files to the slave nodes
An alternative mechanism to share the content is rsync. rsync provides fast incremental file transfer from the master node to the slave nodes. This approach is especially useful if you are facing performance problems with NFS.
Whenever content is changed on the master node, the repository invokes the rsync command to copy the affected files to the slave nodes.
Configuration
The clustering options are configured in an XML file. By default, this file is located at:
$LENYA_HOME/src/webapp/lenya/config/cluster/cluster.xconf
Alternatively, you can provide the path of the file via the lenya.cluster.configFile
system property, e.g.:
export JAVA_OPTS=-Dlenya.cluster.configFile=/etc/lenya/cluster.xconf
The contents of a typical configuration file for the master node looks like this:
<?xml version="1.0" encoding="UTF-8"?> <cluster> <enabled>true</enabled> <mode>master</mode> <rsync> <enabled>true</enabled> <command>/usr/bin/rsync</command> <options>-Rrav --delete</options> <baseDir>/var/lenya</baseDir> <targets> <target>/var/lenya</target> </targets> </rsync> </cluster>
This configuration assumes that all data which have to be synchronized (content, Lucene index, access control data)
are located in /var/lenya
, on the master node as well as on the slave nodes. You can omit the
<rsync>
element if you are using a shared file system for all cluster nodes, e.g. via NFS.
Here's a corresponding configuration for the slave nodes:
<?xml version="1.0" encoding="UTF-8"?> <cluster> <enabled>true</enabled> <mode>slave</mode> </cluster>
Note that you don't need the rsync configuration here since the slave nodes are in passive mode, i.e. they only receive files.