Archive for March, 2009

Tip: Compiling all templates

March 25, 2009

In some use-cases it’s very usefull to compile all templates without too much effort. So when you updated the templates need to have all templates compiled before going online. Or as a short test to find out if your JSP files compile properly. In that case you can use the following one-liner:

perl -ane '$templates{$F[2]}=$F[3]; END{foreach $k (keys %templates) { print "/usr/bin/curl -o /dev/null -s localhost:7042$templates{$k}.html\n";}}' < default.map | sh

This uses your default.map to find a valid handle for each templates and constructs a curl call, which is then executed by the shell. I haven’t yet checked the CRX structures for an aquivalent of the default.map to make it also work on CRX-based installations.

Joergs rules for loadtests

March 23, 2009

In the article “Everything is content (part 2)” I discussed the problems of doing proper loadtests with CQ with respect to your CQ which gets (a bit) lower by every loadtest. In a comment Jan Kuźniak proposed to disable versioning and to restore your loadtest environment for every loadtest. Thinking about Jans contribution revealed a number of topics I consider as crucial for loadtests. I collected some of them and would like to share them.

  • Provide a reasonable amount of data in the system. This amount should be kind of equal to your production system, so the numbers are comparable. Being 20% off doesn’t matter, but don’t expect good results if your loadtest runs on 1000 handles but your production system heads directly to 50k handles. You may optimize the wrong parts of your code then.
    When you benchmarked a speedup of 20% in the loadtests but got nothing in production system, you already saw it.
  • When your loadtest environment is ready to run, create a backup of it. Drop the CQ loadtest installation(s) from time to time, restore it from the backup and re-run your loadtest a clean installation to verify your results. The point I already mentioned.
  • Always have the same configuration in the production and loadtest environment. That’s the reason why I disagree to disable versioning on the loadtesting environment. The effect of diverging configuration may be the same as in the above point: You may optimize the wrong parts of your code.
  • No error messages during loadtest. If an error messages indicates a code problem, it’s probably reproducable by re-running the loadtest (come on, reproducable bugs are the easiest ones to fix :-) ). If it’s a content problem you should adjust your content. A loadtest is also a very basic regression test, so take the results (errors belong there also!) seriously.
  • Be aware of ressource virtualization! Today’s hype is to run as much applications as possible on virtualized environments (VMWare, KVM, Solaris zones, LPARs, …) to increase the efficency of the hardware usage and lower costs. Doing so very often removes some guarantees you need for comparing results of different loadtests. For exmple on one loadtest you have 4 CPUs for you, while on the second one you have 6 CPUs available. Are the results comparable? Maybe they are, maybe not.
    Being limited to always 4 CPUs offers comparable loadtests, but if your production systems requires 8 CPUs, you cannot load your loadtest system with production level numbers. Getting a decent loadtest environment is a hard job …
  • Have good test scenarios. Of course the most basic requirement. Don’t just grab the access.log and throw it at your load injector. Re-running GET requests is easy, but forget about POSTs. Modelling good scenarios is hard and needs much time.

Of course there are a lot of more things to consider, but I will limit myself to these points at the moment. Eventually there will be a part 2.

META: Being linked from Day

March 19, 2009

Today this blog was presented as link of the day on dev.day.com. Thanks for the kudos :-)

Welcome to all who read this blog for the first time. I’ve collected some experiences with Day CQ here and will share my experience and others thoughts furtheron. Don’t hesitate to comment or drop me an email if you have a question to an post.

Please note: I am not a developer, so I cannot help you with specific question to template programming. In that case you can contact one of the groups mentioned on the Day Developer website.

Visualize your requests

March 10, 2009

In the last year, customers often complained about our bad performance. We had just fixed a small memory leak (which crashed our publishing instances about every hour or so), so we were quite interested in getting reliable data to confirm or deny their anger. That time I thought that we need to have a possibility to get a quick overview of the performance of our CQ instances. One look to see “Ok, it must be the network, our systems perform great!”

So I dug out my perl knowhow and wrote a little script which parses through a request.log and prints out data which which is understood by gnuplot. And gnuplot draws then some nice graphs of it. It displays the number of requests per minute and also the average request duration for these requests.

request-graph-all(Click on the image for a larger version.)

These images proved themselves as pretty useful, because you show them to your manager (“Look, the average response went down from 800 miliseconds to 600 although the number of requests went up by 30%.”) and they help you in daily bussiness, because you can spot problems quite well. When at a certain time the response times go up, you better had a look at the system and find the reason for it.

request-graph-html-ukBecause this scripts is quite fast (it parses 300 megabytes of request.log in about 15 seconds on a fast Opteron-based machine), we usually render these images online and integrate the resulting images in a small web application (no CQ but a small hacked up PHP script). For some more interactivity I added the possibility to display only the requests which matches a certain string  (click on the image to view a larger version). So it’s very easy to answer questions such “Is the performance of my landing page that bad as customer report?”

You can download this little perl script here. Run it with “–help” first and it will display a little help screen. Give a number of request.log files as parameter to it, pipe the output directly into gnuplot (I tested with version 4.0, but will probably also work with newer versions) and it will output a png file. Adjust the scripts to your needs and contribute back, I released it under GPL version 2.

(For the hackers: Some things can probably be performed better and I also have some new functionality already prepared in it, but not active. Patches are welcome :-) )

Everything is content (part 2)

March 6, 2009

Recently I pointed out some differences in the handling of the “everything is content” paradigma of Communique. A few days ago I found a posting of David Nüscheler over at dev.day.com, in which he explained the details of performance tuning.

(His 5 rules do not apply exclusively to Day Communique, but to every performance tuning session).

In Rule 2 he states:

Try to implement an agile validation process in the optimization phase rather than a heavy-weight full blow testing after each iteration. This largely means that the developer implementing the optimization has a quick way to tell if the optimization actually helped reach the goal.

In my experience this isn’t viable in many cases. Of course the developer can check quickly, if his new algorithm performs better ( = is faster) than the older one. But in many cases the developer doesn’t have all the ressources  and infrastructure available and doesn’t have all the content in his test system; which is the central point why I do not trust tests performed on developer systems. So the project team relies on central environments which are built for load testing, which have loadbalancers, access to directories, production-ready sized machines and content which is comparable to the production system. Once the code is deployed, you can do loadtesting. Either using commercial software or just using something like jmeter. If you can use Continious integration and and a autodeployment system, you can do such tests every day.

Ok, where have I started? Right, “everything is content”. So you run your loadtest. You create handles, modify them, activiate and drop them, you just request a page to view, you perform acitivies in your site, and so on. Afterwards you look at your results and hopefully they are better than before. Ok. But …

But Communique is not built to forget data — of course it does sometimes, but that’s not the point here :-) , so all these activities are stored. Just take a look at the default.map, the zombie.map, the cmgr.hist file, … So all your recent actions are persisted and CQ knows of them.

Of course handling more of such information doesn’t make CQ faster. In case you have long periods of time between your template updates: Check the performance data directly after a template update and compare them to the ones a few months after (assuming you don’t have CQ instances which aren’t used at all). You will see a  decrease in performance, it can be small and nearly unmeasurable, but it is there. Some actions are slower.

Ok, back to our loadtest. If you run the loadtest again and again and again, a lot of actions are persisted. When you reproduce the code and the settings of the very first loadtest and run the very first loadtest again, and you do that on a system which already faced 100 loadtests, you will see a difference. The result of this 101st loadtest is different from the first one,although the code, the settings and the loadtest are essentially the same. All the same except CQ and its memory (default.map and friends).

So, you need a mechanism which allows you to undo all changes made by such a loadtest. Only then you can perfectly reproduce every loadtest and run them 10 times without any difference in the results. I’ll try to cover such methods (there are some of them, but not all equally suitable) in some of the next posts.

And to get back to the title: Everything is content, even the history. So in contrary to my older posting,where I said:

Older versions of a handle are not content.

They are indeed content, but only when it comes to slowing down the system :-)

Tip: Lock out the users

March 4, 2009

From time to time you need to perform maintenance works on your systems; on such highly configurable systems like Communique these tasks can often be performed online while the authors are working. But some task require that there’s no activity on the system; when you want to reorganize your replication agents, it’s a very good feeling when nobody is trying to activate the super-duper important company news. So locking out the users is required.

A very easy method is available, when only superuser should have access. Because the superuser has no ACL, just add a “DENY all” ACL for all users. And the easiest way to do this is to add this ACL to the “post” user. Then the last ACL for every is a DENY to all handles. This effectivly forbids logins, forbids displaying any content for all users but the superuser.