Archive for January, 2009

Caching the right way

January 22, 2009

I sometimes notice that there is some kind of confusion about how content is transferred from a CQ system to the enduser, mostly regarding caches, cache invalidation and content expiration.

We must make a difference between 2 separate mechanisms:

  1. Caching as in “Communique dispatcher cache”. As already described the dispatcher cache gets only invalidated when a replication agent triggers the invalidation. There isn’t a mechanism which invalidates content after a certain amount of time.
  2. Caching as in “make use of the browser cache”. A RFC to the HTTP standard describes several mechanism to specify the timeframe in which objects are valid. Here is a more informal introduction.

So this 2 mechanism doesn’t collide; if you want to distribute your content effectivly you should use both: The dispatcher cache to lower the load on your CQ systems, and the right HTTP headers to move traffic off your systems (and your internet connection) to downstreamd proxies and browser caches.

Some remarks to the right HTTP headers:

  • If you don’t have any HTTP headers for caching, most proxies and browsers guess how long they consider an object as “live” or “valid”. Do not rely on these, control it yourself! Add the headers.
  • CQ doesn’t add any caching header by itself.
  • An very easy way to add HTTP caching headers is to configure your webserver to add them (for Apache: mod_expires is quite easy to use). Then every time your webserver delivers a object through the dispatcher (either by fetching it from CQ or by retrieving from cache) it will add these headers.

Hints on performance (part 2)

January 17, 2009

For curiosity I often take a look into the the HTTP headers of websites I visit (I use the great Firefox plugin HTTP Live Headers for it). On some major websites I discovered that these don’t make use of HTTP pipelining, which is nowadays a major performance drawback, since today’s website include much more items (images, CSS, Javascripts) than a website of 1998.

Quoting http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html:

For all our tests, a pipelined HTTP/1.1 implementation outperformed HTTP/1.0, even when the HTTP/1.0 implementation used multiple connections in parallel, under all network environments tested. The savings were at least a factor of two, and sometimes as much as a factor of ten, in terms of packets transmitted.

This point isn’t directly related to Day Communique, but can be aplied to all webpages. Take your browser and check if your site makes use of HTTP 1.1 pipelining. How?

Well, that’s pretty easy: Take your browser (I suggest Firefox and the above mentioned plugin Live HTTP Headers), open the plugin and then goto your website. Then check the answers if they contain the line “Connection: closed”; whenenver you see this line, it means that your browser must open a new TCP connection to fetch another file from the server. In the best case you should not find this header at all. If you find such a line, you should really sit down and try to get rid of it.

2 remarks to the dispatcher:

  1. The apache webserver can deliver files from cache or from the dispatcher without breaking the HTTP pipelining. So here it doesn’t matter if a file is taken from the cache or fetched from CQ; if you configured your Apache correctly, you’ll never get a “Connection: closed”.
  2. The dispatcher itself also fetches files using HTTP pipelining by default. You can force it not to do so, but I don’t recommend it. In a version before dispatcher 4.0 this behaviour was broken, but in the most recent versions it works perfectly. And of course: The servlet engine bundled with CQ 3.5.5 and newer supports HTTP pipelining out of the box.

For further reading I recommend Aaron Hopkins’ “Optimizing Page Load Times” and for general performance hints the Best Practices for Speeding Up Your Website by Yahoo.

Hints on performance

January 14, 2009

A lot of things which affect performance can be changed within CQ (without major configuration or even coding just by adjusting the things you want to get displayed or used.

If you’re looking for some performance on your authoring systems, you may consider to disable some information, which is permanently shown to the authors. One of these pieces of information are the number of items in the inbox. This will create every once in a while a background query which slows down your system. It’s essentially the same query which is executed when you open the “Inbox” Tab.

Change user settings

Change user settings

If you’re not using this feature very often, you should disable it for your authors (or at least for the ones who don’t use it).

Open the user settings of the author and un-ceck “Inbox”. If you don’t use notifications, uncheck them too. This will leave you only the link to the impersonation feature which isn’t expensive in terms of rendering perfomancee, but often very useful.usersettings11

Create cachable content

January 9, 2009

In the last posts I described that you should use caching to increase the performance. But caching works only for static content. Most websites aren’t built on top of static pages but they are highly dynamic, presenting dynamic content which can be different for every user who’s visiting this page.

So, the first and easiest part for caching are images and media elements. Cache them! Every request not hitting your CQ instance is a good request. Youre webservers are faster than your CQ instances and scale much better. So configure the dispatcher to cache all files with the extension “.jpg”, “.png” and “.pdf” (and probably some more depending on your content), always keeping in mind that pages with such an extension must not be have different content depending on request data!

The harder part are the content pages, which may change and must be changed based on request data. Usually you cannot cache them because you don’t want to present a cached personalized page to all users requesting that url. But some optimizations are possible:

  • If the number of valid parameter values is rather slow, you can implement them as selectors. Then the page is cachable again, but think of the cache size. Don’t accept any kind of selector.
  • Reduce the effort for the system to render your dynamic page. If only a small part is based on previous input, you can split the whole rendering process into 2 parts. First rendering the dynamic parts by CQ;this is then parsed by the SSI module of your webserver which then adds the static parts to the page. Voila, only a small part is rendered directly by CQ, most parts could come out of the dispatcher cache.

I some later posting I will cover these 2 points more thoroughly.

The output-cache

January 1, 2009

Day Communique offers another level of caching for content: the so-called output-cache. It caches already rendered pages and stores them in the CQ instance itself. The big advantage over the already mentioned dispatcher-cache is that the output-cache knows about the dependencies between handles because it’s part of CQ and can access these informations.

You can combine both dispatcher-cache and the output-cache. If no invalidation happens, the dispatcher-cache can answer most requests for static content and doesn’t bother the CQ with them. An invalidation may invalidate a larger part of the dispatcher-cache, but a lot of the requests, which are then forwarded to CQ, can be answered by the output-cache. So it sounds quite good to use both dispatcher-cache and output-cache to cache all content. In some cases it is.

But the output-cache has some drawbacks:

  • All dependencies are kept in memory (in the java heap). If you have a huge amount of content, the dependency tracking can eat up your memory. There is a hotfix available for CQ 4.2, which limits the cache-size, but it has some problems.
  • The rendered pages are kept on disk. So beside the content itself you need disk space for the rendered output.
  • Probably the most important point: The output-cache makes CQ to use only one CPU! Obviously there is a mutex within the output-cache code, which serializes all rendering requests when the results should be cached. I cannot say that for sure, but this is what I observed in CQ instances using the output-cache. After we disabled the output-cache — which means adding a “deny all” rule to its configuration — this phenomenon was gone and CQ used all available CPUs.

Of course you don’t need to configure the output-cache to cache all rendered pages. You may want to cache only the pages which are often requested and also often invalidated on the dispatcher-cache but weren’t changed at all.