The magic of OSGI: track the come and go of services

Have you already asked yourself, how it comes, that you just need to implement an interface, mark the implementation as service, and — oh magic — your code is called at the right spot. Without any registration.
For example when you wrote your first sling servlet (oh sure, you did it!) it looked like this:

@Property (name=“sling.servlet.paths”, value=“/bin/myservlet”)
public class MyServlet extends SlingSafeMethodServlet  {

and that’s it. How does Sling (or OSGI or whoever) knows that there is a new servlet in the container and calls it, when you visit $Host/bin/myservlet. Or how can you build such a mechanism yourself?

Basically you just use the power of OSGI, which is capable to notify you about any change of the status of an existing bundle and service.

If you want to keep track of all Sling servlets registered in the system you just need to write some more annotations:

@Reference (name=“servlets”, policy=ReferencePolicy.DYNAMIC, cardinality=ReferenceCardinality.OPTIONAL_MULTIPLE, interfaceReference=Servlet.class)
public class ServletTracker {

List<Servlet> allServlets = new ArrayList<Servlet>();

protected bindServlets (Servlet servlet, final Map<String, Object> properties) {
  allServlets.add (servlet);

protected unbindServlets(Servlet servlet, final Map<String,Object> properties) {

public doSomethingUseful() {
  for (Servlet servlet : allServlets) {
    // do something useful with them ...

(Of course you can track any other interface through which services are offered. But be aware, that in many cases only a single instance of a service exists.)

The magic mostly is in the @Reference annotation, which defines that there is optional reference, which takes zero till unlimited references of services implementing the class/interface “Servlet”.  By default methods are called, for which the names are constructed using the “name” statement, resulting in “bindServlets” when a new servlet is registered, and “unbindServlets” when the servlet is unregistered. You can use these methods to whatever you want to do, for example storing the references locally and calling them whenever appropriate. And that’s it.

If you use this approach, your code is called whenever some service implementing a certain interface is being activated/deactivated. With the SCR annotations it’s all possible without having too much trouble and the best of all: Nearly all just by declaration.
If you like to have some more control over  (or just want to code) you can use a ServiceTracker (a nice small example for it is the Apache Aries JMX Whiteboard) to keep manually track of services.

And as recommended reading right now: Learn all the other cool stuff at the Felix SCR page.

And if you need to have more code which uses this approach, you might want to have a look at the SlingPostServlet, which is a excellent example of using this pattern. Oh, and by the way: This pattern is called OSGI whiteboard pattern.

OSGI: static and dynamic references

OSGI as component model is one of the cores of AEM, as it allows to dynamically register and consume services offered by other parts of the system. It’s the central registry you can ask for all kind of services.

If you have some weeks of CQ experience as developer, you probably already know the mechanics how to access a product service. For example one of the most often used statements in a service (or component) is:

 SlingRepository repo;

which gives you access to the SlingRepository service, through which you can reach the JCR. Technically spoken, you build a static reference. So your service gets active only when this reference can be resolved. By this you can rely on the repository being available whenever your service is running. This is a constraint which is not a problem in many cases. Because it wouldn’t make sense for your service to run without the repository, and it also frees to permanently checking “repo” for being not null :-)

Sometimes you don’t want to wait for a reference to be resolved (maybe breaking a dependency loop) or you can just deliver additional value if a certain (optional) service is available. In such cases you can make it a dynamic reference

SlingRepository repo;

Now there’s no hard dependency to the SlingRepository service; so your service might get active before the SlingRepository service is available, and therefor you need to handle the case that “repo” is null.

Per se this feature might have little importance to you, but combining it with other aspects makes it really powerful. More on that in the next post…

AEM 6.0 and Apache Oak: What has changed?

One of the key features of AEM6.0 on the technical side is the use of Apache Oak as a much more scalable repository. It supports the full semantic of JCR 2.0, so all CQ 5.x applications should continue to work. And as an extension of this feature, there is of course mongoDB, which you can use together with Oak.

But, as with ever major reimplementation, something has changed. Things, which worked well on Jackrabbit 2.x and CRX 2.x might behave differently. Or to put in other words: Jackrabbit 2.x allowed you to do some things, which are not mandated by the JCR 2.0 specification.

One of the most prominence examples for this is the visibility of changed nodes. In CRX 2.x when you have an open JCR session A, and in a different session B some nodes are changed, you will see these changes immediately in session A. That’s not mandated by the specification, but Jackrabbit supports it.

Oak introduced the concept of MVCC (multi version concurrency control), which makes that each session only sees a view of the repository, which has been the most recent one the session has been created, but it’s not updated on-the-fly with the changes performed by other sessions. So this is a static view. If you want to get the most recent view of the repository, you need to call explicitly “session.refresh()”.

So, what’s the effect of this?
You may run into subtle inconsistencies, because you don’t see changes performed by others in your session. In most cases, only long-running sessions are really affected by this, because for them it’s often intended to react on changes from the outside, and that you can react on changes made by other threads (e.g. you can check if a certain node has already been created by another session). So if you already have followed the best practices established in the last 1-2 years, you should be fine, as long-running sessions have been discouraged. I also already showed, how such a long-running session might affect performance when used in a service context.

Oak supports you with some more “features” to spot such problems more easily. First, it prints a warning to the log, when a session is open for more than 1 minute. You can check the log and review the use of this sessions. A session being open more than 1 minute is normally a clear sign, that something’s wrong and that you should think about creating sessions with a smaller lifespan. On the other hand you can imagine also cases, where a session open for some more time is the right solution. So you need to carefully evaluate each warning.
And as second “feature”, Oak is able to propagate changes between sessions, if these changes are performed by a single thread (and only by a single thread).
But consider these features (especially the change propagation) as transient features, which won’t be supported forever.

This is one of the interesting parts of the changes in Apache Oak compared to Jackrabbit 2.x, you can find some more in in the Jackrabbit/OAK Wiki. It’s really worth to have a look at when you start with your first AEM 6.0 project.

AEM 6.0: Admin sessions

With AEM6.0 comes a small feature, which you should use to reconsider your usage of sessions, especially the use of admin sessions in your OSGI services.

The feature is: “ResourceResolverFactory.getAdministrativeResourceResolver” is going to be deprecated!

Oh wait, that should be a feature, you might ask. Yes, it is. Because it is being replaced by a feature, which allows you to easily replace the sessions, previously owned by the admin (for the sake of laziness of the developer …) by sessions owned by regular users. Users which don’t have the super-power of admin, but regular users, which have to follow the ACLs as any other regular user.

A nice description how it works can be found on the Apache Sling website.

But how do you use it?

First, define what user should be used by your service. Specify this in the form “symbolic-bundle-name:sub service-name=user” in the config of the ServiceUserMapper service.
Then there 2 extensions to existing services, which leverage this setting:

ResourceResolverFactory.getServiceResourceResolver(authenticationInfo) returns a ResourceResolver created for the user defined in the ServiceUserMapper for the containing bundle (you can specify the sub service in the authenticationInfo if required).

And the SlingRepository service has an additional method loginService(subserviceName, workspace), which returns you a session using this user.

But then this leaves you with the really big task: What permissions should my service user have then? read/create/modify/delete on the root node? But that’s something you can delegate to the people who are doing the user management …

Update 1: Sune asked if you need to specify a password. Of course not :-) Such a requirement would render the complete approach redundant.

Meta: AEM 6.0

I am a bit behind the official announcement of AEM 6.0 (Adobe TV, docs ), also some of my colleagues have taken the lead and already started posting about the major new technical features, Apache Oak (including MongoDB) and Sightly. My colleague Jayna Kandathil offers a nice overview of the technical news.

I will focus on the smaller changes in the stack, and there’s a vast number of it. So stay tuned, I hope to find some quiet moments to blog in the next 2 weeks.


Is “fresh content” a myth?

n the past I often had the requirement, that content must be absolute up to date when it’s delivered to the enduser. It always has to reflect the approved status on the servers, and it must not be outdated. When new content is pushed to the publish instances it has be picked up immediately and every response should return this update content now.
That’s sometimes (and only sometimes) a valid requirement, but the implementation is something you can hardly control.

For example: When your editor presses the “publish” (or “activate”) button in AEM, it’s starting an internal process, which is taking care of packing the content and delivering it to the publish instances, where the content is unpacked. And then you need to invalidate any fronted cache (dispatcher cache) sitting in front of the publish instance.
That’s infrastructure you can control, and with proper setup and monitoring you can achieve a reliable delay of a few seconds between pressing the button and the new content being delivered out of your web servers.

And then comes the internet. Huge caching layers (Content delivery networks) are absorbing lots of requests to reduce internet traffic. Company proxies caching favorite sites of its users to save money. You will find caches all over the place, to reduce costs and improve the speed. Sometimes for the cost of delivering outdated content. And not to forget the browser, which also tries to reduce page loading time by eliminating requests and leveraging a cache on local disk.

They do it for a single reason: You are normally more frustrated by a slow page than by reading a version of the news page, which doesn’t show the very most recent version published 1 second ago (and of course you don’t know that this page was just updated …)

Caching is a tool you should always use to provide a good experience. And in many times the experience is better if you deliver a fast, nearly up-to-date site (with news being 30 seconds old) than a slow site, showing also the breaking news which happened 1 second ago. 
If this delay is reasonable small (in many cases 5 minutes is perfect), noone will ever mention it (besides the editor who pressed the button), but you can improve your caching and the experience of the enduser.

JCR Observation throttle

With JCR observation any JCR repository offers a really nice feature to react on any changes in the repository. Because the mechanism is quite powerful and easy to use, it has found many adopters which work with it.

If you have worked with that feature and also imported large amount of content, you might also encounter the problem that there is a delay between persisting a change in the repository and the time, when the observation event for this change is fired. This delay is mainly based on the number of registered observation handlers and the time these handlers require to process the event (the whole JCR observation events are handled by a single thread). If it takes 100ms to run through all handlers and you persist 10k changes in 2 minutes, it will pill up and take about 20 minutes until the queue is empty again.

This delay may harm your application and user experience, so it’s advisable to

  1. improve the processing speed of any JCR event handler
  2. keep the queue small.

Especially if you have jobs, which are not time critical but might cause a storm of events (e.g. data importers), you should be aware of this and add artificial pauses to your job, so the observation event queue will not get too large.

A simple tool to help has been added to CQ/AEM 5.6.1, it’s called JCR Observation throttle (see

It allows you to wait until all pending events in the observation queue have been delivered. This helps you to wait for quiet moments and then start your data import. But be careful, the delay might be very long, if other processes constantly interfere.