What’s new in Sling with AEM 6.3

AEM 6.3 is out. Besides the various product enhancements and new features it also includes updated versions of many open source libraries. So it’s your chance to have a closer at the version numbers and find out what changed. And again, quite a lot.

For your convenience I collected the version information of the sling bundles which are part of the initial released version (GA). Please note that with Servicepacks and Cumulative Hotfixes new bundle versions might be introduced.
If a version number is marked red, it has changed compared to the previous AEM versions. Bundles which are not present in the given version are marked with a dash (-).

For the versions 5.6 to 6.1 please see this older posting of mine.

Symbolic name of the bundle AEM 6.1 (GA) AEM 6.2 (GA) AEM 6.3 (GA)
org.apache.sling.adapter 2.1.4 2.1.6 (Changelog) 2.1.8 (Changelog)
org.apache.sling.api 2.9.0 2.11.0 (Changelog) 2.16.2 (Changelog)
org.apache.sling.atom.taglib 0.9.0.R988585 0.9.0.R988585 0.9.0.R988585
org.apache.sling.auth.core 1.3.6 1.3.14 (Changelog) 1.3.24 (Changelog)
org.apache.sling.bgservlets 0.0.1.R1582230 1.0.6 (Changelog)
org.apache.sling.bundleresource.impl 2.2.0 2.2.0 2.2.0
org.apache.sling.caconfig.api 1.1.0
org.apache.sling.caconfig.impl 1.2.0
org.apache.sling.caconfig.spi 1.2.0
org.apache.sling.commons.classloader 1.3.2 1.3.2 1.3.8 (Changelog)
org.apache.sling.commons.compiler 2.2.0 2.2.0 2.3.0 (Changelog)
org.apache.sling.commons.contentdetection 1.0.2 1.0.2
org.apache.sling.commons.fsclassloader 1.0.0 1.0.2 (Changelog) 1.0.4 (Changelog)
org.apache.sling.commons.html 1.0.0 1.0.0 1.0.0
org.apache.sling.commons.json 2.0.10 2.0.16 (Changelog) 2.0.20 (Changelog)
org.apache.sling.commons.log 4.0.2 4.0.6 (Changelog) 5.0.0 (Changelog)
org.apache.sling.commons.log.webconsole 1.0.0
org.apache.sling.commons.logservice 1.0.4 1.0.6 (Changelog) 1.0.6
org.apache.sling.commons.metrics 1.0.0 (Changelog) 1.2.0 (Changelog)
org.apache.sling.commons.mime 2.1.8 2.1.8 2.1.10 (Changelog)
org.apache.sling.commons.osgi 2.2.2 2.4.0 (Changelog) 2.4.0
org.apache.sling.commons.scheduler 2.4.6 2.4.14 (Changelog) 2.5.2 (Changelog)
org.apache.sling.commons.threads 3.2.0 3.2.6 (Changelog) 3.2.6
org.apache.sling.datasource 1.0.0 1.0.0 1.0.2 (Changelog)
org.apache.sling.discovery.api 1.0.2 1.0.2 1.0.4 (Changelog)
org.apache.sling.discovery.base 1.1.2 (Changelog) 1.1.6 (Changelog)
org.apache.sling.discovery.commons 1.0.10 (Changelog) 1.0.18 (Changelog)
org.apache.sling.discovery.impl 1.1.0
org.apache.sling.discovery.oak 1.2.6 (Changelog) 1.2.16 (Changelog)
org.apache.sling.discovery.support 1.0.0 1.0.0 1.0.0
org.apache.sling.distribution.api 0.1.0 0.3.0 0.3.0
org.apache.sling.distribution.core 0.1.1.r1678168 0.1.15.r1733486 0.2.6
org.apache.sling.engine 2.4.2 2.4.6 (Changelog) 2.6.6 (Changelog)
org.apache.sling.event 3.5.5.R1667281 4.0.0 (Changelog) 4.2.0 (Changelog)
org.apache.sling.event.dea 1.0.0 1.0.4 (Changelog) 1.1.0 (Changelog)
org.apache.sling.extensions.threaddump 0.2.2
org.apache.sling.extensions.webconsolesecurityprovider 1.1.4 1.1.6 (Changelog) 1.2.0 (Changelog)
org.apache.sling.featureflags 1.0.0 1.0.2 (Changelog) 1.2.0 (Changelog)
org.apache.sling.fragment.ws 1.0.2 1.0.2 1.0.2
org.apache.sling.fragment.xml 1.0.2 1.0.2 1.0.2
org.apache.sling.hapi 1.0.0 1.0.0
org.apache.sling.hc.core 1.2.0 1.2.2 (Changelog) 1.2.4 (Changelog)
org.apache.sling.hc.webconsole 1.1.2 1.1.2 1.1.2
org.apache.sling.i18n 2.4.0 2.4.4 (Changelog) 2.5.6 (Changelog)
org.apache.sling.installer.console 1.0.0 1.0.0 1.0.2 (Changelog)
org.apache.sling.installer.core 3.6.4 3.6.8 (Changelog) 3.8.6 (Changelog)
org.apache.sling.installer.factory.configuration 1.1.2 1.1.2 1.1.2
org.apache.sling.installer.factory.subsystems 1.0.0
org.apache.sling.installer.provider.file 1.1.0 1.1.0 1.1.0
org.apache.sling.installer.provider.jcr 3.1.16 3.1.18 (Changelog) 3.1.22 (Changelog)
org.apache.sling.javax.activation 0.1.0 0.1.0 0.1.0
org.apache.sling.jcr.api 2.2.0 2.3.0 (Changelog) 2.4.0 (Changelog)
org.apache.sling.jcr.base 2.2.2 2.3.2 (Changelog) 3.0.0 (Changelog)
org.apache.sling.jcr.compiler 2.1.0
org.apache.sling.jcr.contentloader 2.1.10 2.1.10 2.1.10
org.apache.sling.jcr.davex 1.2.2 1.3.4 (Changelog) 1.3.8 (Changelog)
org.apache.sling.jcr.jcr-wrapper 2.0.0 2.0.0 2.0.0
org.apache.sling.jcr.registration 1.0.2 1.0.2 1.0.2
org.apache.sling.jcr.repoinit 1.1.2
org.apache.sling.jcr.resource 2.5.0 2.7.4.B001 (Changelog) 2.9.2 (Changelog)
org.apache.sling.jcr.resourcesecurity 1.0.2 1.0.2 1.0.2
org.apache.sling.jcr.webdav 2.2.2 2.3.4 (Changelog) 2.3.8 (Changelog)
org.apache.sling.jmx.provider 1.0.2 1.0.2 1.0.2
org.apache.sling.launchpad.installer 1.2.0 1.2.2 (Changelog) 1.2.2
org.apache.sling.models.api 1.1.0 1.2.2 (Changelog) 1.3.2 (Changelog)
org.apache.sling.models.impl 1.1.0 1.2.2 (Changelog) 1.3.9.r1784960 (Changelog)
org.apache.sling.models.jacksonexporter 1.0.6
org.apache.sling.provisioning.model 1.8.0
org.apache.sling.repoinit.parser 1.1.0
org.apache.sling.resource.inventory 1.0.4 1.0.4 1.0.6 (Changelog)
org.apache.sling.resourceaccesssecurity 1.0.0 1.0.0 1.0.0
org.apache.sling.resourcecollection 1.0.0 1.0.0 1.0.0
org.apache.sling.resourcemerger 1.2.9.R1675563-B002 1.3.0 (Changelog) 1.3.0
org.apache.sling.resourceresolver 1.2.4 1.4.8 (Changelog) 1.5.22 (Changelog)
org.apache.sling.rewriter 1.0.4 1.1.2 (Changelog) 1.2.1.R1777332 (Changelog)
org.apache.sling.scripting.api 2.1.6 2.1.8 (Changelog) 2.1.12 (Changelog)
org.apache.sling.scripting.core 2.0.28 2.0.36 (Changelog) 2.0.44 (Changelog)
org.apache.sling.scripting.java 2.0.12 2.0.14 (Changelog) 2.1.2 (Changelog)
org.apache.sling.scripting.javascript 2.0.16 2.0.28 2.0.30
org.apache.sling.scripting.jsp 2.1.6 2.1.8 2.2.6
org.apache.sling.scripting.jsp.taglib 2.2.4 2.2.4 2.2.6
org.apache.sling.scripting.jst 2.0.6 2.0.6 2.0.6
org.apache.sling.scripting.sightly 1.0.2 1.0.18 1.0.32
org.apache.sling.scripting.sightly.compiler 1.0.8
org.apache.sling.scripting.sightly.compiler.java 1.0.8
org.apache.sling.scripting.sightly.js.provider 1.0.4 1.0.10 1.0.18
org.apache.sling.scripting.sightly.models.provider 1.0.0 1.0.6
org.apache.sling.security 1.0.10 1.0.18 1.1.2
org.apache.sling.serviceusermapper 1.2.0 1.2.2 1.2.4
org.apache.sling.servlets.compat 1.0.0.Revision1200172 1.0.0.Revision1200172 1.0.0.Revision1200172
org.apache.sling.servlets.get 2.1.10 2.1.14 2.1.22
org.apache.sling.servlets.post 2.3.6 2.3.8 2.3.15.r1780096
org.apache.sling.servlets.resolver 2.3.6 2.4.2 2.4.10
org.apache.sling.settings 1.3.6 1.3.8 1.3.8
org.apache.sling.startupfilter 0.0.1.Rev1526908 0.0.1.Rev1526908 0.0.1.Rev1764482
org.apache.sling.startupfilter.disabler 0.0.1.Rev1387008 0.0.1.Rev1387008 0.0.1.Rev1758544
org.apache.sling.tenant 1.0.2 1.0.2 1.1.0
org.apache.sling.tracer 0.0.2 1.0.2

AEM 6.3: set admin password on initial startup (Update)

With AEM 6.3 you have the chance to setup the admin password already on the initial start. By default the quickstart asks you for the password if you start it directly. That’s a great feature and shortens quite some deployment instructions, but it doesn’t work always.

For example, if you first unpack your AEM instance and then use the start script, you’ll never get asked for the new admin password. The same if you work in an application server setup. And if you do automatic installations, you don’t want to get asked at all.

But I found, that even in these scenarios you can set the admin password as part of the initial installation. There are 2 different ways:

  1. Set the system property “admin.password” to your desired password; and it will be used (for example add “-Dadmin.password=mypassword” to the JVM parameters).
  2. Or set the system property “admin.password.file” and pass as value the path to a file; when this file is accessible by the AEM instance and the contains the line “admin.password=myAdminPassword“, this value will be used as admin password.

Please note, that this only works on the initial startup. On all subsequent startups these system properties are ignored; and you should probably remove them or at least purge the file in case of (2).

Update: Ruben Reusser mentioned, that the Osgi Webconsole Admin password is not changed (which is used in case the repository is not running). So you still need to work on that.

What I check on code reviews

At several occassions I did code reviews on AEM projects  in the last months. I don’t do that exercise quite often, so I don’t have a real routine or checklists what to look at. But in the past I learned some lessons about how to write code for AEMs, so I hope I check relevant pieces. Feedback appreciated.

So let’s start with my top 10 items I look for:

  1. The use of “System.out.println()“, “System.err.println()” and “e.printStackTrace()” statements. Logging is cheap and easy, but obviously not easy enough, I still find these statements. They should be replaced, because these statements do not provide relevant metadata like time and class. And to be honest, people tend to look only at the error.log file, but not on stdout.log.
  2. Servlets bound to fixed paths. In most cases it should be replaced by binding either to a selector or to a resourcetype. The Sling documentation explains it quite well.
  3. The creation of dedicated sessions/ResourceResolvers (either admin sessions ot service user sessions) and if these sessions are properly closed. Although this should be common knowledge to AEM developers, there’s still code out there which doesn’t close resource resolvers or logs out sessions, causing memory leaks.
  4. The existence of long-running sessions. You shouldn’t write services, which open a session on activate and close them on deactive (see this blog post for the explanation). The only exception to this rule: JCR observation handlers.
  5. adaptTo() calls without proper null checks. adaptTo() is allowed to return null. There are cases where it can be neglected (in reality you’ll never have all occurrences of it checked), but in most cases it has to be checked to avoid NullPointerException.
  6. Log hygiene part 1: The excessive use of LOG.error(), when a LOG.info() issufficient. Or even worse: LOG.error/info() instead of LOG.debug().
  7. Log hygiene part 2: Log messages without meaningful description. A log message has to contain relevant information like “with what parameter does this exception happen? At what node? Which request?”. Consider that some parts are always and implicitly logged (e.g. the thread name, which contains in case of a request the requested path), but you need to provide every other information which can be useful to understand the problem when found in the logs.
  8. The mixed usage of JCR and Sling API. Choose either one, but then stick to it. You should not have a method, which takes both a Session and a ResourceResolver as parameter (or other object from these domains).
  9. Performing string operations on paths. I already blogged about it.
  10. JCR queries. Are they properly used or can they get replaced by a short tree traversal?

So when you get through all of these quite generic items, your code is already quite well. And if you don’t have a specific problems which I should investigate, I will likely stop here. Because then you already proved, that you understand AEM and how to operate it quite well, so I wouldn’t expect major issues anymore.

I am sure that the personal background influences a lot the intuitive approach on code review. Therefor I am interested in your checklists and how it differs from mine. Just leave a comment, drop me mail or tweet me 🙂

Let’s try to compile a list which we all can use to improve our code.

AEM coding best practice: No String operations on paths!

I recently needed to review project code in order to validate if it makes problems when upgrading from AEM 5.6 to AEM 6.x; so my focus wasn’t on the code in the first place, but on some other usual suspects (JCR queries etc). But having seen a few dozen classes I found a pattern, which I then found all over the code: the excessive use of String operations. With a special focus on string operations on repository paths.

For example something like this:

String[] segments = resource.getPath.split("/");
String settingsPath = "/" + StringUtils.join(segments,"/",0,2) + "/settings/jcr:content";
Resource settings = resourceResolver.get(settingsPath);
ValueMap vm = settings.adaptTo(ValueMap.class);
String language = vm.get("language");

(to read settings, which are stored in a dedicated page per site).

Typically it’s a weird mixture of String methods, the use of StringUtils classes plus some custom helpers, which do things with ResourceResolvers, Sessions and paths. Spread all over the codebase. Ah, and it lacks a lot of error checking (what if the settings page doesn’t contain the “language” property? adaptTo() can return “null”).

Sadly that problem not limited to this specific project code, I found it in many other projects as well.

Such a code structure is a clear sign for the lack of abstraction and guidance. There are no concepts available, which eliminate the need to operate on strings, but the developer is left with the only abstraction he has: The repository and CRXDE Lite’s view on it. He logs into the repository, looks at the structure and then decides how to mangle known pieces of information to get hold of the things he needs to access. If there’s noone which reviews the code and rejects such pieces, the pattern emerges and you can find it all over the codebase. Even if developers create utility classes for (normally every developer creates one on its own), it’s a bad approach, because these concepts are not designed (“just read the language from the settings page!”), but the design “just happens“; there is no focus on it, therefor quality is not enforced and error handling typically quite poor.

Another huge drawback of this approach: It’s very hard to change the content structure, because at many levels assumptions about the content structure are used, which are often not directly visible. For the example the constants “0” and “2” in the above code snippets determine the path prefix, but you cannot search for such definitions (even if they are defined as constant values).

If the proper abstraction would be provided, the above mentioned code could look like this:

String language = "en";
Settings settings = resource.adaptTo(Settings.class);
If (settings != null) {
  language = settings.getLanguage();
}

This approach is totally agnostic of paths and the fact that settings are stored inside a page. It hides this behind a well-defined abstraction. And if you ever need to change the content structure you know where you need to change your code: Only in the AdapterFactory.

So whenever you see code which uses String operations on paths: Rethink it. Twice. And then come up with some abstraction layer and refactor it. It will have a huge impact on the maintainability of your code base.

When is AEM fully started?

Or in other words: How can I know that the instance is fully working?

A common task when you work with automation is a realiable detection when the AEM instance is up and running. Maybe you reconfigure the loadbalancer to send requests to this instance. Or you just start doing some other work.

The most naive approach is to request a AEM page and act on the HTTP status code. If the status is “200”, you consider the system up and running. If you get any other code, it’s not. Sounds easy, is easy. But not really accurate. Because there are times during startup, when the system returns a status code 200, but a blank page. Unfortunate.

So next approach: Check if all bundles are active. Check /system/console/bundles.json and parse it. Look for a statement like this:

status":"Bundle information: 447 bundles in total - all 447 bundles active.

Nice try, but does not work. All bundles being up does not guarantee, that all the services are up as well.

The third approach is more compplicated and requires coding, but delivers good results: Build a healthcheck which depends on a lot of other services (the ones you consider important). If this healthcheck is present and delivers ok, it means, that all services it depends on are active as well (the simple default semantic of the @Reference annotation guarantees that). This does not necessarily mean, that the startup is finished, but just that the services you considered relevant are up.

And finally there is a fourth approach, which has been built specifically for this case: The startup listeners. It’s a service interface you can implement, and you get notified when the system is up. That’s it. The API does not give any guarantee that if the system is up, that 5 minutes later it is still up. I am not 100% sure so the semantics of this approach if a service fails to start. Or if a service decides to stop (or starts throwing exceptions).

The healthcheck is my personal favorite. It can be used not  only to give you information about a single event (“the system is up”), but it can take much more factors into account to decide if the system is up. And these factors can be constantly checked. When a service is no longer available, the healthcheck goes to ERROR (“red”), and it’s available again, the healthcheck reports OK again. The approach is more powerfull, provides better extensibility and is quite easy to understand. So I choose a healthcheck everytime when I need to know about the health state of AEM.

 

 

 

Application changes and incompatible features

In the lifecycle of every larger application there are many occasions where features evolve. In most cases the new version of a feature is compatible with earlier versions, but there are always cases where incompatible changes are required.

As example let me take the scenario the guys at KD WebConsult provided in their latest blog entry (which has inspired me to write this post, I have to admit). There is a component which introduces changes in the underlying repository structure, and the question is how to cope with that change in case of deployments.

I think that is a classical case of incompatible changes, which always result in additional effort; and that’s the reason why noone likes incompatible changes and tries to avoid them as much as possible. While in a pure AEM environment you should have the full control of all changes, it’s getting harder if you have system you depend on or systems depending on you. Then you run into the topic of interface lifecycle management. Making changes then gets hard and sometimes nearly impossible. You end up with supporting multiple versions of an interface at the same time. Duplicating code is then a way to cope with it. (The technical debt is not only on your side then, but also on the side of others no updating or able to update their use of the interfaces.)

So to come back to the KD Webconsult example I think that the cleanest solution is to build your component in a way, that it supports both the old and new the repository structure (their option 2). And if you are sure, that you don’t use the old structure anymore, you can safely remove the logic for it.

The thinking, that you can always avoid such situations and keep you code clean, is wrong. As soon as you are dealing with non-trivial setup (and AEM in an enterprise setup per se is a distributed application which comes along with other enterprisy requirements like high-availability) you have to make compromises. And taking technical debts for the time of a release or two is not necessarily a bad one if you can stick with standard processes (not changing deployment processes).

JCR Observation in clustered AEM instances

Clustering AEM got a bit different with the introduction of OAK. But with the enforcement of the MVCC model in Oak I also advise to revisit some patterns you might got used to. Because some code which worked with no apparent problem in AEM 5.x might cause problems now.

One thing I would check are the JCR Observation Listeners. Using JCR observation is a common way to react on changes in the repository and this is common pattern since CQ 5.0. So what’s the problem with that? The problem is that many JCR observation handlers are not written with clustering in mind.

Take the example that you need to react on changes in the repository and in turn modify something else. The usual approach is to have a service like this (omitting a lot of the boilerplate …)

public class MyListener implements EventListener {

 @Activate
 protected void activate() {
  ...
  ObservationManager om = session.getWorkspace().getObservationManager();
  om.addEventListener (this, 
   Event.NODE_ADDED,
   "/content/mysite",
   null,
   new String[]{"cq:Page"},
   true,
   true);
  ...
 }

 public onEvent (EventIterator events) {
  // iterate through the events and change something in the repository.
 }

}

This works very well in any non-clustered environment, because there is only a single event handler performing these changes. In clustered environments the situation is different, because now on each cluster node there is such a event handler active. And each one wants to perform the repository changes.
In that case you’ll see a lot of Oak exceptions (on all cluster nodes) which indicate that nodes have been modified externally (outside of the current session) and that a merge was not possible. This is because the changes happen in (quasi-) parallel, but not visible to the currently open sessions, thus causing these exceptions.

The only solution to this problem is to execute the EventListener only on a single node or to handle every event by exactly one event handler and not on all.

Handling every observation event on exactly handler is the elegant and scalable solution. The idea is to handle on every cluster node only the changes which happen on this cluster nodes („local events“). While the JCR API doesn’t have any notion of cluster and the Observation API does not give any information if a event is local or not, the Jackrabbit implementation (which Oak is using here) supports this through the JackrabbitObservationManager. As you can see in the following snippet, only the registration of the ObservationHandler changes, but not the handler itself.

public class MyScalableListener implements EventListener {

 @Activate
 protected void activate() {
  ...
  JackrabbitEventFilter ef = new JackrabbitEventFilter()
   .setAbsPath("/content/mysite")
   .setNodeTypes(new String[{"cq:Page"})
   .setEventTypes(Event.NODE_ADDED)
   .setIsDeep(true)
   .setNoExternal(true);
  JackrabbitObservationManager om = (JackrabbitObservationManager) session.getWorkspace().getObservationManager();
  om.addEventListener (this, ef);
  ...
 }

 public onEvent (EventIterator events) {
  // iterate through the events and change something in the repository.
 }
}

Through the Jackrabbit API extension you can register you EventListener to only handle local changes only and ignore any external ones, which are generated on another cluster nodes (using the setNoExternal(true) call). This is a scalable solution because the events handled at the location where they are generated, and no cluster nodes gets a bottleneck because of this.

So whenever you write an ObservationHandler and especially when you use a cluster, you should review your code and make sure, that you avoid concurrent access to the same resource. Of course there are many ways to have concurrent access even without clustering, but when you actually use clustering, the JCR observation handlers are the easiest piece of code to check and fix.