CRX 2.1: Improved backup

CRX 2.1 introduced no major changes, but only smaller improvements in many places. In simple geometrixx-based performance tests you can get a speedup of about 30% (at least that’s the result I benchmarked with CRX 2.1 and CRX Hotfix 2.1.0.5), just because of improvements in Jackrabbit and CRX code. So, going to CRX 2.1 is a recommendation I would make for every CQ 5.3 project.

And next to these improvements a small extension to the online backup was introduced. Up to and including CRX 2.0 you backuped to a zip file (which contains the complete quickstart and the repository). In CRX 2.1 it is no longer required to zip the backuped data, but the backup procedure can simply drop it into a directory. This directory still contains all data (quickstart and the repository), so you run the zip programm yourself if you want. So, what’s the great deal here?

First of all, your backup software can do an incremental backup on this directory. Especially the tar files and the index will change quite often (think of the TarPM optimizer), but the datastore files won’t change, so they are ideal candidates for incremental backup.

Speaking of the datastore and its immutable files: Because files are only added to the datastore (and not changed), you can optimize the backup. The online backup backups the data based on its knowledge of the directory, where the quickstart is placed in. Everything in that directory and beneath is covered by the backup process. If you move the datastore to a directory outside the directory, where the quickstart.jar resides, the datastore isn’t backupped. Why is that good? Because if we always our backup in the same directory, we can use rsync to backup the datastore. Rsync is much better suited for this job, because it works incremental (CRX online backup doesn’t) and works outside of the CRX/CQ5 java process (which is always good). Remember: Files in the datastore are only added, not changed!

I recommend to change your CQ5 filesystem layout as follows:

cq5
 - backup
   - datastore
   - cq5
     - cq-wcm-quickstart.jar
     - crx-quickstart
     - ...
     - license.properties
 - datastore
 - cq5
   - cq-wcm-quickstart.jar
   - crx-quickstart
   - ...
   - license.properties

This layout ensures, that you would be able to start the CQ5 directly from the backup directory (altough you shouldn’t do it) or simply move it to the “production” location, without any configuration changes.

Moving the datastore is easy: Shutdown CQ5/CRX  and replace the line
<DataStore class="com.day.crx.core.data.ClusterDataStore"/>

in the repository.xml with the following ones:

<DataStore class="com.day.crx.core.data.ClusterDataStore">
<param name="path" value="../../../../backup/datastore" />
</DataStore>

I recommend to use a relative adress here; because it’s relative to the directory crx-quickstart/repository/repository, you should enter
"../../../../datastore” here. Adding a symlink won’t help!

Then create this directory structure and move all the datastore files there (crx-quickstart/repository/repository/datastore/*) and make sure, that the ACLs are set properly. Startup and have fun!

And now the rough skeleton of a backup script to illustrate the process; it requires curl, rsync and a little perl. For the windows-based systems I cannot give much advice; probably there is also some tool like rsync.


#!/bin/sh

HOST=localhost:4502
ADMINPW=admin # must be URL encoded!!

BACKUP_FILENAME="" # we want CRX to backup to the directory, no zip file!
BACKUP_DIR=/opt/cq5/backup
INSTANCE_DIR=/opt/cq5

COOKIE=cookie.txt
CURLPARAMETERS="-s -S"

# hacky, using a appropriate perl module would be better
ENCODED_BACKUP_DIR=`echo "${BACKUP_DIR} | perl -pe 's|\/|%2F|g'`
touch ${COOKIE}
chmod 600 ${COOKIE}
echo "Start: `date`"
curl -c ${COOKIE} ${CURLPARAMETERS} "http://${HOST}/crx/login.jsp?UserId=admin&Password=$ADMINPW&Workspace=crx.default" > /dev/null
curl -b ${COOKIE} ${CURLPARAMETERS} -o progress.txt "http://${HOST}/crx/config/backup.jsp?action=add&targetDir=${ENCODED_BACKUP_DIR}&zipFileName=${BACKUP_FILENAME}" > /dev/null
rm progress.txt
rm ${COOKIE}

echo "Syncing datastore"
DATASTORE_DIR="${INSTANCE_DIR}/datastore/"
DATASTORE_BACKUP_DIR="${BACKUP_DIR}/datastore"
rsync -a ${DATASTORE_DIR} ${DATASTORE_BACKUP_DIR}
echo "Finished: `date`"

Update:Fixed path in the repository.xml snippet. Thanks Thomas.

3 thoughts on “CRX 2.1: Improved backup

  1. James Stansell

    Wondering if there are any considerations for when files are removed from the datastore (ie datastore garbage collection)

    1. jhoh228 Post author

      rsync also removes files from the target directory, if they are no longer present in the source directory.

  2. James Stansell

    I just found out about http://www.dirsyncpro.org/ which is written in java. Wondering if it could be a credible cross-platform tool for CQ backup. I would assume probably more useful for developers, since sysadmins will likely already have a favorite platform-specific tool.

Comments are closed.