First, the package already contains a status page (reachable via <host>/content/statuspage.html), which looks like this:
This information is computed out of all the invidual checks which are listed in the details table according to this ruleset:
- If a least 1 check returns CRITICAL, the overall status is “CRITICAL”.
- If at least 1 check returns WARN an no check returns CRITICAL, the overall status is “WARN”.
- If all status return OK, the overall status is “OK”.
The overall status is easily parseable on the statuspage by a monitoring system.
The indivual checks are listed by name, status and an optional message. This list should be used to determine which check failed and caused the overall status to deviate from OK.
The status in detail:
- OK: obvious, isn’t it?
- WARN: the checked status is not “OK, but also not CRITICAL. The system is still usable, but you need to observe the system closer, or need to perform some actions, so the situation won’t get worse.
- CRITICAL: The system should not be used and user experience will be impacted. Actions required.
Managing the loadbalancers
Any loadbalancer in front of CQ5 instances also should be aware of the status of the instance. But loadbalancers probes much more often (about every 30 seconds), and they don’t have that much capabilities to parse complex data. For this usecase there is the “/bin/loadbalancer” servlet, which returns only “OK” with a statuscode 200, or “WARN” with a statuscode “500”. WARN indicates both WARN and CRITICAL, in both cases it is assumed, that the loadbalancer should not send requests to that instance.