Connecting dispatchers and publishers

Today I want to cover a question which comes up every now and then (my gut feeling says this question appeared at least once every quarter for the last 5 years …):

How should I connect my dispatchers with the publishs? 1:1, 1:n or m:n?

To give you an impression how these scenarios could look like I graphed the 1:1 and the n:m scenario.

publish-dispatcher-connections-1-1-final

The 1:1 setup, where each dispatcher is connected to exactly 1 publish instance; for the invalidation every publish is also connected only with its assigned dispatcher instance.

publish-dispatcher-connections-N-M-final

The n:m setup, where n dispatcher connce to m publish instances (for illustration here with n=3 and m=3); each dispatch is connected via loadbalancer to each publish instance, but each publish instance needs to invalidate all dispatcher caches.

I want to give you my personal opinion and answer to it. You might get other answers, both from Adobe consultants and other specialists outside of Adobe. They all are valuable insights into the question, how it’s done best in your case. Because it’s your case which matters.
My general answer to this question is: Use a 1:1 connection for these reasons:
  • it’s easy to debug
  • it’s easy to monitor
  • does not require any additional hardware or configuration
From an high-availability point of view this approach seems to have a huge drawback: When either the dispatcher or the publish instance fails, the other part is not available as well.
Before we discuss this, let me state some facts, which I consider as basic and foundation to all my arguments here:
  • The dispatcher and the web server (I can only speak for Apache HTTPD and its derivates, sorry IIS!) are incredibly stable. In the last 9 years I’ve setup and operated a good number of web environments and I’ve never seen a crashing web server nor a crashing dispatcher module. As long as noone stops the process, this beast is handling requests.
  • A webserver (and the dispatcher) is capable to deliver thousands of requests per second, if these files originate from the local disks and just need to be delivered. That’s at least 10 times the number any publish can handle.
  • If you look for the bottleneck in handling HTTP requests in your AEM architecture it’s always the publish application layer. Which is exactly the reason why there is a caching layer (the dispatcher) in front of it.
  • My assumption is, that a web server on modern hardware (and operating systems) is able to deliver static files with a bandwidth of more than 500 mbit per second (at a mixed file scenario). So in most cases before you reach the limit of your web servers, you reach the limit of your internet connection. Please note, that this number is just a rough guess (depending on many other factors).
Based on these assumptions, let’s consider these scenarios in a 1:1 setup:
  • When the publish instance fails, the dispatcher instance isn’t fully operational anymore, as it does not reach its renderer instance anymore; so it’s best to take it out of the load balancing pool.
    So does this have any effect on the performance capabilities of your architecture? Of course it has, it reduces your ability to deliver static files from the dispatcher cache. Which we could avoid if we had the dispatcher connected to other publishs as well. But as stated above, the delivery performance of static files isn’t a bottle neck at all, so when we take out 1 web server you don’t see any effect.
  • A webserve/dispatcher fails, and the connected publish instance is not reachable anymore, effectively reducing the power your bottleneck even more.
    Admitted, that’s true; but as stated above, I’ve rarely seen a crashed web server; so this case is mostly true in case of hardware problems or massive misconfigurations.
So, your have an measurable impact only in case that a web server hardware went down, in all other cases it’s not a problem for the performance.
This is a small drawbacks, but from my point of view the other benefits stated above outweigh it by far.
This is my standard answer, when there’s no more specific information available. It’s a good rule of thumb. But if you have more specific requirement, it might have sense to change the 1:1 rule to a different one.
For example:
  • You plan to have 20 publish instances. Then it doesn’t make sense to have 20 webserver/dispatchers as well.
  • You want to serve a huge amount of static data (e.g. 100 TB of static assets), so your n copies of the same file get’s expensive in terms of disk space.
If you choose a different approach than the 1:1 scenario described in this blog post, please keep these factors in mind:
  • How do you plan to invalidate the dispatcher caches? Which publish instance will invalidate which dispatcher cache?
  • How do you plan to do maintenance of the publish instances?
  • What’s the effort to add or remove a new publish instance? What’s need to be changed?
Before you plan to spend a lot of time and effort into building a complex dispatcher scenario, please think if a CDN isn’t a more appropriate solution to your problem…
Advertisements