Category Archives: Uncategorized

What’s new in Sling with AEM 6.3

AEM 6.3 is out. Besides the various product enhancements and new features it also includes updated versions of many open source libraries. So it’s your chance to have a closer at the version numbers and find out what changed. And again, quite a lot.

For your convenience I collected the version information of the sling bundles which are part of the initial released version (GA). Please note that with Servicepacks and Cumulative Hotfixes new bundle versions might be introduced.
If a version number is marked red, it has changed compared to the previous AEM versions. Bundles which are not present in the given version are marked with a dash (-).

For the versions 5.6 to 6.1 please see this older posting of mine.

Symbolic name of the bundle AEM 6.1 (GA) AEM 6.2 (GA) AEM 6.3 (GA)
org.apache.sling.adapter 2.1.4 2.1.6 (Changelog) 2.1.8 (Changelog)
org.apache.sling.api 2.9.0 2.11.0 (Changelog) 2.16.2 (Changelog)
org.apache.sling.atom.taglib 0.9.0.R988585 0.9.0.R988585 0.9.0.R988585
org.apache.sling.auth.core 1.3.6 1.3.14 (Changelog) 1.3.24 (Changelog)
org.apache.sling.bgservlets 0.0.1.R1582230 1.0.6 (Changelog)
org.apache.sling.bundleresource.impl 2.2.0 2.2.0 2.2.0
org.apache.sling.caconfig.api 1.1.0
org.apache.sling.caconfig.impl 1.2.0
org.apache.sling.caconfig.spi 1.2.0
org.apache.sling.commons.classloader 1.3.2 1.3.2 1.3.8 (Changelog)
org.apache.sling.commons.compiler 2.2.0 2.2.0 2.3.0 (Changelog)
org.apache.sling.commons.contentdetection 1.0.2 1.0.2
org.apache.sling.commons.fsclassloader 1.0.0 1.0.2 (Changelog) 1.0.4 (Changelog)
org.apache.sling.commons.html 1.0.0 1.0.0 1.0.0
org.apache.sling.commons.json 2.0.10 2.0.16 (Changelog) 2.0.20 (Changelog)
org.apache.sling.commons.log 4.0.2 4.0.6 (Changelog) 5.0.0 (Changelog)
org.apache.sling.commons.log.webconsole 1.0.0
org.apache.sling.commons.logservice 1.0.4 1.0.6 (Changelog) 1.0.6
org.apache.sling.commons.metrics 1.0.0 (Changelog) 1.2.0 (Changelog)
org.apache.sling.commons.mime 2.1.8 2.1.8 2.1.10 (Changelog)
org.apache.sling.commons.osgi 2.2.2 2.4.0 (Changelog) 2.4.0
org.apache.sling.commons.scheduler 2.4.6 2.4.14 (Changelog) 2.5.2 (Changelog)
org.apache.sling.commons.threads 3.2.0 3.2.6 (Changelog) 3.2.6
org.apache.sling.datasource 1.0.0 1.0.0 1.0.2 (Changelog)
org.apache.sling.discovery.api 1.0.2 1.0.2 1.0.4 (Changelog)
org.apache.sling.discovery.base 1.1.2 (Changelog) 1.1.6 (Changelog)
org.apache.sling.discovery.commons 1.0.10 (Changelog) 1.0.18 (Changelog)
org.apache.sling.discovery.impl 1.1.0
org.apache.sling.discovery.oak 1.2.6 (Changelog) 1.2.16 (Changelog)
org.apache.sling.discovery.support 1.0.0 1.0.0 1.0.0
org.apache.sling.distribution.api 0.1.0 0.3.0 0.3.0
org.apache.sling.distribution.core 0.1.1.r1678168 0.1.15.r1733486 0.2.6
org.apache.sling.engine 2.4.2 2.4.6 (Changelog) 2.6.6 (Changelog)
org.apache.sling.event 3.5.5.R1667281 4.0.0 (Changelog) 4.2.0 (Changelog)
org.apache.sling.event.dea 1.0.0 1.0.4 (Changelog) 1.1.0 (Changelog)
org.apache.sling.extensions.threaddump 0.2.2
org.apache.sling.extensions.webconsolesecurityprovider 1.1.4 1.1.6 (Changelog) 1.2.0 (Changelog)
org.apache.sling.featureflags 1.0.0 1.0.2 (Changelog) 1.2.0 (Changelog)
org.apache.sling.fragment.ws 1.0.2 1.0.2 1.0.2
org.apache.sling.fragment.xml 1.0.2 1.0.2 1.0.2
org.apache.sling.hapi 1.0.0 1.0.0
org.apache.sling.hc.core 1.2.0 1.2.2 (Changelog) 1.2.4 (Changelog)
org.apache.sling.hc.webconsole 1.1.2 1.1.2 1.1.2
org.apache.sling.i18n 2.4.0 2.4.4 (Changelog) 2.5.6 (Changelog)
org.apache.sling.installer.console 1.0.0 1.0.0 1.0.2 (Changelog)
org.apache.sling.installer.core 3.6.4 3.6.8 (Changelog) 3.8.6 (Changelog)
org.apache.sling.installer.factory.configuration 1.1.2 1.1.2 1.1.2
org.apache.sling.installer.factory.subsystems 1.0.0
org.apache.sling.installer.provider.file 1.1.0 1.1.0 1.1.0
org.apache.sling.installer.provider.jcr 3.1.16 3.1.18 (Changelog) 3.1.22 (Changelog)
org.apache.sling.javax.activation 0.1.0 0.1.0 0.1.0
org.apache.sling.jcr.api 2.2.0 2.3.0 (Changelog) 2.4.0 (Changelog)
org.apache.sling.jcr.base 2.2.2 2.3.2 (Changelog) 3.0.0 (Changelog)
org.apache.sling.jcr.compiler 2.1.0
org.apache.sling.jcr.contentloader 2.1.10 2.1.10 2.1.10
org.apache.sling.jcr.davex 1.2.2 1.3.4 (Changelog) 1.3.8 (Changelog)
org.apache.sling.jcr.jcr-wrapper 2.0.0 2.0.0 2.0.0
org.apache.sling.jcr.registration 1.0.2 1.0.2 1.0.2
org.apache.sling.jcr.repoinit 1.1.2
org.apache.sling.jcr.resource 2.5.0 2.7.4.B001 (Changelog) 2.9.2 (Changelog)
org.apache.sling.jcr.resourcesecurity 1.0.2 1.0.2 1.0.2
org.apache.sling.jcr.webdav 2.2.2 2.3.4 (Changelog) 2.3.8 (Changelog)
org.apache.sling.jmx.provider 1.0.2 1.0.2 1.0.2
org.apache.sling.launchpad.installer 1.2.0 1.2.2 (Changelog) 1.2.2
org.apache.sling.models.api 1.1.0 1.2.2 (Changelog) 1.3.2 (Changelog)
org.apache.sling.models.impl 1.1.0 1.2.2 (Changelog) 1.3.9.r1784960 (Changelog)
org.apache.sling.models.jacksonexporter 1.0.6
org.apache.sling.provisioning.model 1.8.0
org.apache.sling.repoinit.parser 1.1.0
org.apache.sling.resource.inventory 1.0.4 1.0.4 1.0.6 (Changelog)
org.apache.sling.resourceaccesssecurity 1.0.0 1.0.0 1.0.0
org.apache.sling.resourcecollection 1.0.0 1.0.0 1.0.0
org.apache.sling.resourcemerger 1.2.9.R1675563-B002 1.3.0 (Changelog) 1.3.0
org.apache.sling.resourceresolver 1.2.4 1.4.8 (Changelog) 1.5.22 (Changelog)
org.apache.sling.rewriter 1.0.4 1.1.2 (Changelog) 1.2.1.R1777332 (Changelog)
org.apache.sling.scripting.api 2.1.6 2.1.8 (Changelog) 2.1.12 (Changelog)
org.apache.sling.scripting.core 2.0.28 2.0.36 (Changelog) 2.0.44 (Changelog)
org.apache.sling.scripting.java 2.0.12 2.0.14 (Changelog) 2.1.2 (Changelog)
org.apache.sling.scripting.javascript 2.0.16 2.0.28 2.0.30
org.apache.sling.scripting.jsp 2.1.6 2.1.8 2.2.6
org.apache.sling.scripting.jsp.taglib 2.2.4 2.2.4 2.2.6
org.apache.sling.scripting.jst 2.0.6 2.0.6 2.0.6
org.apache.sling.scripting.sightly 1.0.2 1.0.18 1.0.32
org.apache.sling.scripting.sightly.compiler 1.0.8
org.apache.sling.scripting.sightly.compiler.java 1.0.8
org.apache.sling.scripting.sightly.js.provider 1.0.4 1.0.10 1.0.18
org.apache.sling.scripting.sightly.models.provider 1.0.0 1.0.6
org.apache.sling.security 1.0.10 1.0.18 1.1.2
org.apache.sling.serviceusermapper 1.2.0 1.2.2 1.2.4
org.apache.sling.servlets.compat 1.0.0.Revision1200172 1.0.0.Revision1200172 1.0.0.Revision1200172
org.apache.sling.servlets.get 2.1.10 2.1.14 2.1.22
org.apache.sling.servlets.post 2.3.6 2.3.8 2.3.15.r1780096
org.apache.sling.servlets.resolver 2.3.6 2.4.2 2.4.10
org.apache.sling.settings 1.3.6 1.3.8 1.3.8
org.apache.sling.startupfilter 0.0.1.Rev1526908 0.0.1.Rev1526908 0.0.1.Rev1764482
org.apache.sling.startupfilter.disabler 0.0.1.Rev1387008 0.0.1.Rev1387008 0.0.1.Rev1758544
org.apache.sling.tenant 1.0.2 1.0.2 1.1.0
org.apache.sling.tracer 0.0.2 1.0.2

AEM 6.3: set admin password on initial startup (Update)

With AEM 6.3 you have the chance to setup the admin password already on the initial start. By default the quickstart asks you for the password if you start it directly. That’s a great feature and shortens quite some deployment instructions, but it doesn’t work always.

For example, if you first unpack your AEM instance and then use the start script, you’ll never get asked for the new admin password. The same if you work in an application server setup. And if you do automatic installations, you don’t want to get asked at all.

But I found, that even in these scenarios you can set the admin password as part of the initial installation. There are 2 different ways:

  1. Set the system property “admin.password” to your desired password; and it will be used (for example add “-Dadmin.password=mypassword” to the JVM parameters).
  2. Or set the system property “admin.password.file” and pass as value the path to a file; when this file is accessible by the AEM instance and the contains the line “admin.password=myAdminPassword“, this value will be used as admin password.

Please note, that this only works on the initial startup. On all subsequent startups these system properties are ignored; and you should probably remove them or at least purge the file in case of (2).

Update: Ruben Reusser mentioned, that the Osgi Webconsole Admin password is not changed (which is used in case the repository is not running). So you still need to work on that.

What I check on code reviews

At several occassions I did code reviews on AEM projects  in the last months. I don’t do that exercise quite often, so I don’t have a real routine or checklists what to look at. But in the past I learned some lessons about how to write code for AEMs, so I hope I check relevant pieces. Feedback appreciated.

So let’s start with my top 10 items I look for:

  1. The use of “System.out.println()“, “System.err.println()” and “e.printStackTrace()” statements. Logging is cheap and easy, but obviously not easy enough, I still find these statements. They should be replaced, because these statements do not provide relevant metadata like time and class. And to be honest, people tend to look only at the error.log file, but not on stdout.log.
  2. Servlets bound to fixed paths. In most cases it should be replaced by binding either to a selector or to a resourcetype. The Sling documentation explains it quite well.
  3. The creation of dedicated sessions/ResourceResolvers (either admin sessions ot service user sessions) and if these sessions are properly closed. Although this should be common knowledge to AEM developers, there’s still code out there which doesn’t close resource resolvers or logs out sessions, causing memory leaks.
  4. The existence of long-running sessions. You shouldn’t write services, which open a session on activate and close them on deactive (see this blog post for the explanation). The only exception to this rule: JCR observation handlers.
  5. adaptTo() calls without proper null checks. adaptTo() is allowed to return null. There are cases where it can be neglected (in reality you’ll never have all occurrences of it checked), but in most cases it has to be checked to avoid NullPointerException.
  6. Log hygiene part 1: The excessive use of LOG.error(), when a LOG.info() is sufficient. Or even worse: LOG.error/info() instead of LOG.debug().
  7. Log hygiene part 2: Log messages without meaningful description. A log message has to contain relevant information like “with what parameter does this exception happen? At what node? Which request?”. Consider that some parts are always and implicitly logged (e.g. the thread name, which contains in case of a request the requested path), but you need to provide every other information which can be useful to understand the problem when found in the logs.
  8. The mixed usage of JCR and Sling API. Choose either one, but then stick to it. You should not have a method, which takes both a Session and a ResourceResolver as parameter (or other object from these domains).
  9. Performing string operations on paths. I already blogged about it.
  10. JCR queries. Are they properly used or can they get replaced by a short tree traversal?

So when you get through all of these quite generic items, your code is already quite well. And if you don’t have a specific problem which I should investigate, I will likely stop here. Because then you already proved, that you understand AEM and how to operate it quite well, so I wouldn’t expect major issues anymore.

I am sure that the personal background influences a lot the intuitive approach on code review. Therefor I am interested in your checklists and how it differs from mine. Just leave a comment, drop me a mail or tweet me 🙂

Let’s try to compile a list which we all can use to improve our code.

Application changes and incompatible features

In the lifecycle of every larger application there are many occasions where features evolve. In most cases the new version of a feature is compatible with earlier versions, but there are always cases where incompatible changes are required.

As example let me take the scenario the guys at KD WebConsult provided in their latest blog entry (which has inspired me to write this post, I have to admit). There is a component which introduces changes in the underlying repository structure, and the question is how to cope with that change in case of deployments.

I think that is a classical case of incompatible changes, which always result in additional effort; and that’s the reason why noone likes incompatible changes and tries to avoid them as much as possible. While in a pure AEM environment you should have the full control of all changes, it’s getting harder if you have system you depend on or systems depending on you. Then you run into the topic of interface lifecycle management. Making changes then gets hard and sometimes nearly impossible. You end up with supporting multiple versions of an interface at the same time. Duplicating code is then a way to cope with it. (The technical debt is not only on your side then, but also on the side of others no updating or able to update their use of the interfaces.)

So to come back to the KD Webconsult example I think that the cleanest solution is to build your component in a way, that it supports both the old and new the repository structure (their option 2). And if you are sure, that you don’t use the old structure anymore, you can safely remove the logic for it.

The thinking, that you can always avoid such situations and keep you code clean, is wrong. As soon as you are dealing with non-trivial setup (and AEM in an enterprise setup per se is a distributed application which comes along with other enterprisy requirements like high-availability) you have to make compromises. And taking technical debts for the time of a release or two is not necessarily a bad one if you can stick with standard processes (not changing deployment processes).

Automation done right — a reliable process

(the subtitle could be: Why I don’t call simple curl scripts „automation“).

Jakub Wadalowski mentioned in his talk on CircuitConf 2016, that „Jörg doesn’t like curl“. Which is not totally true. I love curl as a command line http client (to avoid netcat).

But I don’t like the way how people are using curl in scripts and then call these scripts „automation“; I already blogged about this topic.

Good AEM automation requires much more than just package installation. True automation has to work in a way, that you don’t need to perform any manual intervention, supervision or control. The automation has to work in a reliable and repeatable fashion.
But we also know, that there will ever be errors and problems, which prevent a successful result. So a package installation can work, but the bundles might come not up. In such cases the system has to actively report problems back. But this has to be accurate in a way, that you can rely on it: If no messaging occurs, everything is fine! There are no manual checks anymore!
This really requires a process you can trust. You need to trust your automated process so much, that you can start it and then go home.

So to come back to the „Jörg doesn’t like curl“ statement of Jakub: When you just have the curl calls and no error checking (I am not aware of an easy way to determine the status code of the HTTP request done with curl) and no proper error handling and reporting, it will never be reliable. It might save you a few clicks, but in the end you still have to perform a lot of manual steps. And to get away from these manual steps just with shell scripting, it requires a good amount of scripting.

And to be honest: I’ve rarely seen „good shell scripts“.

Storage sizing and SAN

A part of every project I’ve done in the last years was always the task to create a hardware sizing; many times it was part of the project setup and was a very important piece which got fed into the hardware provisioning process.

(Hardware sizing is often required during presales stages, where it is mostly used to compare different vendors in terms of investment into hardware. In such situations the process to create a sizing is very similar, but the results are often communicated quite differently …)

Depending on the organization this hardware provisioning process can be quite lengthy, and after the parameters have been set they are very hard to change. That is especially true with large organisations which want to use an on-premise deployment in their own datacenter; because then it means starting a purchase process followed by provisioning and installation, and so on. Especially in the FSI area it is not uncommon to have 6 months from the signed order of the budget manager to the handover of an installed server. And then you get exactly what you asked for. So everyone tries to make sure, that the input data is as good as possible, and that all variables have been considered. Reminds me a lot of the waterfall model in the software development business, and it comes with the same problems.

Even if your initial sizing was 100% accurate (which it never is), 6 months are a long time where also some important project parameters can change. So there is a not-so-small chance, that at the time the servers are handed over to you, you know that the hardware is not sufficient anymore, based on the information you have right now.
But changes are hardly possible, because for cost efficiency you ordered not the hardware which offered the most flexibility in terms of future growth, but a model with some constraints in terms of extendability. And now, 6 months after project start and just 4 months ahead of the golive date, you cannot restart the purchasing process anymore!

Or a bit worse: You are already live for 6 months and now you start to run short of the disk space, because your growth is much higher than anticipated. But the drive bays of your servers are already full and you have already implemented the largest disks available.

For the topic of disk space the solution can be quite easy: Don’t use local disks! Even if local SSDs are very performant in terms of IOPS, try to avoid the discussion and go for a SAN (Storage area network), which should be available already in an enterprise datacenter. (Of course you can also choose any different technology, which decouples the storage from the server in a strong way and performs well.) For AEM and TarMK a good SAN is sufficient to deliver decent performance (even if a local SSD improves this again).

I know that this statement can be problematic, as there are cases where IOPS are more important than the flexibility to easily add storage. Then your only chance is to take the SSDs and make sure, that you still have the chance to add more of them.

The benefit of a SAN is that you split the servers from their storage, and you can upgrade or extend them independently from each other. Adding more hard drives to a SAN is always possible, so you have hardly a limit in terms of available disk space per server. Attaching more disk space to a server is then a matter of minutes and can be done incrementally. This allows you also to attach disk space on demand instead of attaching the full amount of disk space on provisioning time (and consuming the full amount 2 years later).
And if you have servers and storage split up, it is also much easier to replace a server by another model with more capacity (RAM or CPU-wise), because you don’t need to move all data but rather just connect the storage to a different server.

So using a SAN does not free you up from delivering a good sizing, but it can soften the impacts of an insufficient sizing (mostly based on insufficient data), which is often the case on project kickoffs.

Managing repository growth

On almost every project there is this time, when (all of a sudden) the disk space assigned to AEM instances becomes full; in most (!) cases such a situation is detected early on so there is time to react, which often means just adding more disk capacity.

But after that the questions arise: Why is our repository so large? Why does it consume more than the initially estimated disk space? What content exactly is causing this problem? What went wrong so we actually got into that situation?

From my point of view there are 3 different views on this situation, which can be an answer:

  • Disk space is not managed well, that means that unnecessary stuff is consuming lot of space outside of the repository.
  • The maintenance jobs are not executed.
  • The estimation for content and content growth was not realistic.

And in most cases you need to check all these 3 views to answer the question “Why are we running out of space?”

Disk space is not managed well
This is an operations problems in the first place, because non-repository data is basically consuming disk space which has been planned for the repository. Often seen:

  • Heapdumps and log files are not rotated or removed.
  • Manually created backups files have just been copied to a folder next to the original one once, but in meanwhile they are not useful any more, because they are totally out of date.
  • The regular backup process is creating temporary files, which are not cleaned up; or the backup process itself consumes temporary disk space, which let sthe disk consumption spike.

So this can just be handled by careful working and in-time purging of old data.

The maintenance jobs are not executed
Maintenance jobs are an essential part of the ongoing job to remove the unnecessary fat from you AEM instance, be it on the content level or on a repo level. It includes

  • workflow purge
  • audit log purge
  • repository compaction (if you use TarMK)
  • datastore GC (if you use a datastore)

You should always keep an eye on these; the Maintenance Dashboard is a great help here. But do rely on it blindly

Your estimation for content and content growth was nor realistic
That’s a common problem; you have to give an initial hardware sizing, which also includes the amount of disk space used by the repository. You do your best to include all relevant parameters, you add some buffer on top. But that is an estimation on the beginning of the project, when you don’t know all the requirements and their impact on disk consumption in detail. But that’s what you said, and changing them afterwards is always problematic.

Or AEM is used differently than initially anticipated and all the assumptions you have based your initial hardware sizing are not longer true. Or you just forgot to add the versioning of the assets to your calculation. Or…
There are a lot of cases where in retrospective the initial sizing of the required disk space was just incorrect. In that case you have only chance: Redo the calculation right now! Take your new insights and create a new hardware sizing. And then implement it and add more hardware.

And the only way I see to avoid such situations is: Do not make estimations for the next 3 years! Be a agile and review your current numbers every 3 months; during this review you can also determine the needs for the next months and plan accordingly. Of course this assumes that you are flexible in terms of disk sizing, so for any non-trivial setup the use of SAN as storage technology (plus a volume manager) is my preferred choice!

Of course this does not free yourself from working on the cleanup of the systems and running all required maintenance jobs; but it will make the review of the used disk space a regular task; so you should see deviations from your anticipated plan much earlier.