Or in other words: How can I know that the instance is fully working?
A common task when you work with automation is a realiable detection when the AEM instance is up and running. Maybe you reconfigure the loadbalancer to send requests to this instance. Or you just start doing some other work.
The most naive approach is to request a AEM page and act on the HTTP status code. If the status is “200”, you consider the system up and running. If you get any other code, it’s not. Sounds easy, is easy. But not really accurate. Because there are times during startup, when the system returns a status code 200, but a blank page. Unfortunate.
So next approach: Check if all bundles are active. Check /system/console/bundles.json and parse it. Look for a statement like this:
status":"Bundle information: 447 bundles in total - all 447 bundles active.
Nice try, but does not work. All bundles being up does not guarantee, that all the services are up as well.
The third approach is more compplicated and requires coding, but delivers good results: Build a healthcheck which depends on a lot of other services (the ones you consider important). If this healthcheck is present and delivers ok, it means, that all services it depends on are active as well (the simple default semantic of the @Reference annotation guarantees that). This does not necessarily mean, that the startup is finished, but just that the services you considered relevant are up.
And finally there is a fourth approach, which has been built specifically for this case: The startup listeners. It’s a service interface you can implement, and you get notified when the system is up. That’s it. The API does not give any guarantee that if the system is up, that 5 minutes later it is still up. I am not 100% sure so the semantics of this approach if a service fails to start. Or if a service decides to stop (or starts throwing exceptions).
The healthcheck is my personal favorite. It can be used not only to give you information about a single event (“the system is up”), but it can take much more factors into account to decide if the system is up. And these factors can be constantly checked. When a service is no longer available, the healthcheck goes to ERROR (“red”), and it’s available again, the healthcheck reports OK again. The approach is more powerfull, provides better extensibility and is quite easy to understand. So I choose a healthcheck everytime when I need to know about the health state of AEM.