1000 nodes per folder and Oak orderable nodes

Every now and then there’s the question, how many child nodes are supported in JCR. While the technical correct answer is „there is no limit“, in practice there are some limitations.

In CRX 2.x the nodes are always ordered. In CRX 2.x even unordered nodes are treated as if they are ordered, which made the difference nearly to non-existent. [Thanks Justin for making this clear!] This means, that the order needs to be maintained on all operations, including add and remove of sibling nodes. The more child nodes a node has, the more time it takes to maintain this list.

So, what’s about this „1000 child nodes“ limit? First of all, this number is arbitrary :-) But when you use CRXDE Lite, it’s getting really slow to browse a node with lots of child nodes, mostly because of the time it takes the Javascript to render it. But of course also the performance of add and remove operations degrade linearly. Also you don’t have hardly cases where you would have more than 1000 child nodes.

But for the aspect of reading nodes there is no impact on performance. So it is not a problem to have 6000 nodes in /libs/wcm/core/i18n/en, because you only read the nodes, but you don’t change them.

But nevertheless this „limit“ can be cumbersome, especially if you don’t need to the feature of ordered child nodes. Also the fact that there is this limit means, that adding you have the impact (at a a lower level) also already with less nodes.

With Apache Oak this has changed. With Oak nodes are not ordered unless its parent has node type which supports ordering.

To illiustrate the difference between sling:folder and sling:orderedFolder; i did a small test. I wrote a small benchmark to create 5000 nodes, then add more nodes, do random reads and delete them afterwards. For every operation a single node is created or deleted followed by a save(). (Sourcecode)

Operation sling:Folder sling:OrderedFolder
Create 5000 nodes 6124 ms 17129 ms
Random read 500 nodes 2 ms 9 ms
Add 500 nodes 112 ms 564 ms

This small benchmark (executed on 2014 Macbook pro with SSD, AEM 6.0, TarMK, Oak 1.0.0) shows:

  • Adding lots of child nodes to a node is much faster when you using a non-ordering nodetype
  • Also random read is faster, obviously Oak can use more efficient data structures than a list, if it doesn’t need to maintain the ordering.

The factor of 3-4 is obviously quite significant. Of course the benefit is smaller if you have less child nodes.

The problems of multi-tenancy: the development model

in large enterprises AEM project tends to attract many different interested parties, which all love to make use of the features of AEM. They want to get onboard the platform as fast as they can. And this can be a real problem when it comes to such multi-tenancy AEM platform.

In the previous post I wrote about the governance problems with such projects and all the politics involved in such projects. These problems pursue also in the daily business of the development and operation of such platforms.

Many of these tenants already have their development partners and agencies, which they are used to work with. These partners have experience in that specific area and know the business. So it’s quite likely, that the tenants continue to work with their partners also in this specific project. And there the technical problems starts.

Because at that point, you’ll realize, that you have multiple teams, which rarely collaborate or in worst case not at all. Teams which might have different skill levels, operate in different development models and use a different tooling. And each one of these teams gets its own prioritization and has its own schedule, and in most cases the amount of communication between these teams is quite low.

So now the platform owner (or the development manager on behalf) needs to setup a development model, which allows these multiple teams to feed all their results into a single platform. A model which doesn’t slow down your development agility and does not negatively impact the platforms stability and performance. And this is quite hard.

A number of these challenges are (note: most of them are not specific to AEM at all!):

  • How can you ensure communication and collaboration between all development parties? That’s often a part, which is left out (or forgotten) during time and budget estimation, therefor the amount of time spent on it is reflecting this fact. But that’s the most important piece here.
  • On the other hand, how do you make sure, that overhead of communication and coordination is as low as possible? In most cases this means, that each party gets its own version control system, its own maven module and its own build jobs. This allows a better separation of concerns during development and build time , but just postpones the problem. Because …
  • How you avoid the case, that multiple parties use the same names, which have to be unique? For example the same path below /apps or the same client library name? It’s hard to detect this at development time, when you don’t have checks, which cover multiple repositories and maven modules.
  • Somehow related: How do you handle dependencies to the same library but with different versions? Although OSGI supports this also during runtime, AEM isn’t really prepared for such a situation, that you should have a library in both version 1 and version 2. So you need to centrally manage the list of 3rd libraries (including version numbers), which the teams can use.
  • A huge challenge is testing. When you managed to deploy all delivered artifacts to a single instance (and combining these artifacts into deployable content packages often imposes its own set of problems), how do you test and where do you report issues? How’s triaging the issues to assigns it to the individual teams for fixing? This can cause very easy a culture of blaming and denying, which make the actual bug fixing part very hard.
  • The same with production problems. No tenant and therefor no development team wants to get blamed for bringing down the platform because of some issue, so each problem can get very political, and teams start to argument, why they are not responsible.
  • And many more…

These are real world problems, which hurt productivity

That sounds pretty much as the Star Trek universe, where is all the openness and no selfish people any more (well, ignore the Ferengi for the moment …). But you don’t to wait until 23rd century to have such a project, so I’d like to share some ideas which can make it work already today.

  • The platform owner should communicate open to all tenants and involved development teams, and encourage them to adhere to a common development model.
  • The platform owner should provide clear rules how each team is supposed to work, how they create and share their artifacts, and also clear rules for coding and naming.
  • The platform owner should be in charge for a small team which is supporting all tenants and all development teams and helps to align requirements and the integration of the different codebases. This team is also responsible for all the 3rd party library management and should have write access to the code repositories of all development teams.
    * Build and deployment is centralized as well.
  • Issue triaging is a cross-team effort.

This is all possible in a setup, where the platform owner is not only a function, which is not only responsible to run the platform, but also allowed to exercise control over the deployment artifacts of the individual parties.

Some sidenote: There is an architectural style called „micro services“, which seems to get traction at the moment. It claims to address the „many teams working on a single platform“ problem as well. But the whole idea is based on the split of monolithic application into single self-contained services, which does not really apply to this multi-tenancy problem, where every tenant wants to customize some aspects of the common system for itself. If you apply this approach to this multi-tenancy problem here, you end up with a multi-platform architecture, where each tenant has its own version of the platform.

What is new in Sling with AEM 6.1?

AEM 6.1 is out. Congratulations to my colleagues in the engineering departments for their hard work in the last year.

Every release of AEM 6.1 goes together with changes in Sling, mostly bugfixes and smaller enhancements. Normally these changes are not mentioned directly in the releasenotes of AEM, but in most cases you have to look them up on your own.

For the AEM 6.1 release I want to create a small series of blog posts, which point out the major changes in the packaged Sling bundles, only considering the changes available in 6.1 and not yet in 6.0 (not included hotfixes and featurepacks). I will try to cover some major changes and improvements you can use in your projects.

So let’s start with the complete list of Sling bundles (sorted alphabetically) and their versions in both 6.0 and 6.1; and for completeness I also added the versions of AEM 5.6.1. In case a bundle isn’t available in a specific version, I inserted a “-“.

Symbolic Name of the Bundle AEM 5.6.1 AEM 6.0 AEM 6.1
org.apache.sling.adapter 2.1.0 2.1.0 2.1.4
org.apache.sling.api 2.4.3.R1488084 2.7.0 2.9.0
org.apache.sling.atom.taglib 0.9.0.R988585 0.9.0.R988585 0.9.0.R988585
org.apache.sling.auth.core 1.1.2 1.1.7.R1584705 1.3.6
org.apache.sling.bgservlets 0.0.1.Rev1231138 0.0.1.R1582230 0.0.1.R1582230
org.apache.sling.bundleresource.impl 2.1.2 2.2.0 2.2.0
org.apache.sling.commons.classloader 1.3.0 1.3.2 1.3.2
org.apache.sling.commons.compiler 2.1.0 2.1.0 2.2.0
org.apache.sling.commons.fsclassloader 1.0.0
org.apache.sling.commons.html 1.0.0 1.0.0 1.0.0
org.apache.sling.commons.json 2.0.6 2.0.6 2.0.10
org.apache.sling.commons.log 3.0.0 4.0.0 4.0.2
org.apache.sling.commons.logservice 1.0.2 1.0.2 1.0.4
org.apache.sling.commons.mime 2.1.4 2.1.4 2.1.8
org.apache.sling.commons.osgi 2.2.0 2.2.0 2.2.2
org.apache.sling.commons.scheduler 2.3.4 2.4.2 2.4.6
org.apache.sling.commons.threads 3.1.0 3.2.0 3.2.0
org.apache.sling.datasource 1.0.0
org.apache.sling.discovery.api 0.1.0.R1484784 1.0.0 1.0.2
org.apache.sling.discovery.impl 0.1.0.R1486590 1.0.8 1.1.0
org.apache.sling.discovery.support 0.1.0.R1484784 1.0.0 1.0.0
org.apache.sling.distribution.api 0.1.0
org.apache.sling.distribution.core 0.1.1.r1678168
org.apache.sling.engine 2.2.8 2.3.3.R1588174 2.4.2
org.apache.sling.event 3.1.5.R1485539 3.3.10 3.5.5.R1667281
org.apache.sling.event.dea 1.0.0
org.apache.sling.extensions.threaddump 0.2.2 0.2.2 0.2.2
org.apache.sling.extensions.webconsolesecurityprovider 1.0.0 1.0.0 1.1.4
org.apache.sling.featureflags 1.0.0 1.0.0
org.apache.sling.fragment.ws 1.0.2 1.0.2 1.0.2
org.apache.sling.fragment.xml 1.0.2 1.0.2 1.0.2
org.apache.sling.hc.core 1.1.0 1.2.0
org.apache.sling.hc.webconsole 1.1.0 1.1.2
org.apache.sling.i18n 2.2.4 2.2.8 2.4.0
org.apache.sling.installer.api 1.0.0
org.apache.sling.installer.console 1.0.0 1.0.0 1.0.0
org.apache.sling.installer.core 3.4.6 3.5.0 3.6.4
org.apache.sling.installer.factory.configuration 1.0.10 1.0.12 1.1.2
org.apache.sling.installer.factory.subsystems 1.0.0
org.apache.sling.installer.provider.file 1.0.2 1.0.2 1.1.0
org.apache.sling.installer.provider.jcr 3.1.6 3.1.6 3.1.16
org.apache.sling.javax.activation 0.1.0 0.1.0 0.1.0
org.apache.sling.jcr.api 2.1.0 2.2.0 2.2.0
org.apache.sling.jcr.base 2.1.2 2.2.2 2.2.2
org.apache.sling.jcr.classloader 3.1.12 3.2.0
org.apache.sling.jcr.compiler 2.1.0 2.1.0 2.1.0
org.apache.sling.jcr.contentloader 2.1.6 2.1.6 2.1.10
org.apache.sling.jcr.davex 1.2.0 1.2.0 1.2.2
org.apache.sling.jcr.jcr-wrapper 2.0.0 2.0.0 2.0.0
org.apache.sling.jcr.registration 0.0.1.R1345943 1.0.0 1.0.2
org.apache.sling.jcr.resource 2.2.9.R1483758 2.3.7.R1591843 2.5.0
org.apache.sling.jcr.resourcesecurity 0.0.1.R1562502 1.0.2
org.apache.sling.jcr.webdav 2.2.0 2.2.2 2.2.2
org.apache.sling.jmx.provider 1.0.2 1.0.2
org.apache.sling.launchpad.installer 1.2.0 1.2.0 1.2.0
org.apache.sling.models.api 1.0.0 1.1.0
org.apache.sling.models.impl 1.0.2 1.1.0
org.apache.sling.resource.inventory 1.0.2 1.0.4
org.apache.sling.resourceaccesssecurity 0.0.1.R1579485 1.0.0
org.apache.sling.resourcecollection 0.0.1.R1479861 1.0.0 1.0.0
org.apache.sling.resourcemerger 1.1.2 1.2.9.R1675563-B002
org.apache.sling.resourceresolver 1.0.6 1.1.0 1.2.4
org.apache.sling.rewriter 1.0.4 1.0.4 1.0.4
org.apache.sling.scripting.api 2.1.4 2.1.6 2.1.6
org.apache.sling.scripting.core 2.0.24 2.0.26 2.0.28
org.apache.sling.scripting.java 2.0.6 2.0.6 2.0.12
org.apache.sling.scripting.javascript 2.0.12 2.0.13.R1566989 2.0.16
org.apache.sling.scripting.jsp 2.0.28 2.0.28 2.1.6
org.apache.sling.scripting.jsp.taglib 2.1.8 2.2.0 2.2.4
org.apache.sling.scripting.jst 2.0.6 2.0.6 2.0.6
org.apache.sling.scripting.sightly 1.0.2
org.apache.sling.scripting.sightly.js.provider 1.0.4
org.apache.sling.security 1.0.4 1.0.6 1.0.10
org.apache.sling.serviceusermapper 1.0.0 1.2.0
org.apache.sling.servlets.compat 1.0.0.Revision1200172 1.0.0.Revision1200172 1.0.0.Revision1200172
org.apache.sling.servlets.get 2.1.4 2.1.8 2.1.10
org.apache.sling.servlets.post 2.3.1.R1485589 2.3.4 2.3.6
org.apache.sling.servlets.resolver 2.2.4 2.3.2 2.3.6
org.apache.sling.settings 1.2.2 1.3.0 1.3.6
org.apache.sling.startupfilter 0.0.1.Rev1387008 0.0.1.Rev1526908 0.0.1.Rev1526908
org.apache.sling.startupfilter.disabler 0.0.1.Rev1387008 0.0.1.Rev1387008 0.0.1.Rev1387008
org.apache.sling.tenant 1.0.0 1.0.0 1.0.2

The problems of multi-tenancy: governance

A recurring topic in AEM projects is multi-tenancy. Wikipedia describes multitenancy as „[…] software architecture in which a single instance of a software […] serves multiple tenants“. In the AEM projects I’ve done I encountered this pattern most when a company wants to host several brands and/or subsidiaries as independent tenants within a single AEM platform (that means: connected authoring and publishing instances). In this blog post I only cover the aspect of multi tenancy in a single company. Hosting tenants for multiple independent companies is a different story and likely even more complex.

At first sight multi-tenancy seems to be only a technical problem (separation of content/templates/components, privileges, etc.), but from what I learned, there is a much bigger problem, which you should solve first. And that’s the aspect of organization and governance.

Multitenancy is hard when different tenants (being brand organizations or subsidiaries) need to integrate into the single platform. Each tenant has its own requirements (depending on its special needs), its own timelines, and its own budget. You have larger tenants and smaller tenants on your AEM platform. But this does not necessarily reflect the power of these tenants inside the company. It may even contradict, and a smaller or less powerful organization or brand has such demands, that it will be the largest tenant on your AEM platform.

That means, that there will be conflicts, when it comes to defining scope, timeline and budget. The tenant which contributes more budget wants to have more influence on these 3 aspects than another tenant, which spends a significant smaller amount. But the smaller tenant might have needs which can overrule this, for example a tradeshow where some new features on the brand pages are absolutely required, while the other tenant (yet more powerful within the organization) has requirements, which are important in a more distant future. How are these requirements prioritized?

These questions (and conflicts) are not new, they exist for decades, even not centuries. But they have a huge impact on the platform owner. The platform owner wants to satisfy the needs of all the tenants, but is often faced with contradicting requirements; while on the technical side these can be often (more or less) solved (just by throwing people and time onto the problem), there are still things which are in the first place organizational issues, and which can only be solved on a organizational or political level. Then you have topics like:

  • How can you coordinate different timelines of different tenants, so you can satisfy all their needs?
  • Tenants want to have their own development teams or agencies. How can they work together and feed their results into a single platform without breaking it? Who’s responsible when the platform broke down?
  • How do you do funding when tenants contribute development work to the platform and other benefits from this work as well? Invoicing the tenants which benefit from other tenant’s development work?
  • What’s the role of the platform owner? Does the platform its own budget or is it solely funded by the tenants? Is the platform owner able to reject feature requests from tenants and say “no”?
  • How should the platform owner react with contradicting requirements? Is splitting the single platform into multiple ones (with different codebase) something which is desirable?

There are a lot of questions like these, and they are very specific to the company and the platform. They can all be solved, but the company and the organization itself has to solve them, but not the platform development team(s). Because then the organization foo will go down even to the developers (and as we all know: this kind of human being doesn’t really like that :-))

My ideal multi-tenancy project looks like this: A strong platform owner with some budget on its own. The tenants have pretty much the same size, and they fund the platform for the largest part to the same amount each. A steering committee (with participants from all tenants) deciding on all the organizational topics, and the same on the technical level if required. Requirements are consolidated on a project level and then implemented by a team, which is reporting to the platform owner.

Yeah, I have to admit, I haven’t found that customer project yet :-) But in such a project you as a member of the development team don’t really feel anymore the multi-tenancy aspect on an organizational level, but you only have to deal with it only on a technical level. Which is very nice.

AEM Basics: Runmodes

Today I want to discuss a feature, which is very basic and widely used. I want to discuss “runmodes“. You might already encountered it when you deployed an authoring instance and a publishing instance. Basically both can be deployed from the very same installation package, but just because of a magic string at the right place during installation the behaviour changes dramatically, one instance becomes an authoring instance, the other instance becomes a publishing instance. It’s because of the runmode you configured.

You can think of runmodes as labels or roles you attach to instances, and “author” and “publish” are just special ones. On runtime you can check for these labels and react accordingly (the SlingSettingsService is your friend here). A more sophisticated usecase is the OSGI configuration. Based on the location they config is placed, this config might be active or not, depending on the runmodes (see the AEM docs on this topic).

But the runmodes are not limited only to “author” or “publish”, but you can attach as many runmodes to an instance as you like. For example you can create labels indicating the environments of development (for example “integration” or “preproduction”), and you can have special configuration for these environments.This makes it a lot easier, if you want your application to behave differently on these environments as on production.

The best of all: When you use runmodes to differentiate your environments from each other, you can easily have all configurations for all environments in a single content package, and deploy this package to all environments, no matter if it’s the production or integration environments. If the runmode don’t match, it is just not getting active.


AEM coding best practice: Servlets

You might say, servlets are old technology. So old, that every Java web developer should know everything about them.

Yes, servlets exist since the 90’s of last century (to be exact: 1997), and the basics haven’t really changed. So what’s so special about servlets, that I decide to write a dedicated blog post on it and title it „AEM coding best practice“?

Well, there’s nothing special in terms of coding. All things which are recommended since 1997, can still be considered valid. But there’s some subtle difference between servlet development for AEM development and the development of servlets for other types of applications: AEM (or: Sling) is resource oriented.

This aspects makes it hard for developers, who normally bind servlets to hardcoded paths (either via annotations or via the web.xml bindings). Binding servlets to a path is still possible in Sling, but it is actually an anti-pattern. Because then this servlet is not bound to a real existing resource, and therefor a number of goodies of Sling are not applicable.

Instead I recommend you to bind the servlets to resource types. The first and probably most obvious reason is that you do not need to hardcode any path within your code (or config), but instead you can just move the resource type to the path where you like it to be, and then the servlet can called via this path. And the second benefit is, that you can apply access control on the JCR nodes backing the respective resource. If you don’t have read access on that resource, you can not call the servlet. Which is a great way to restrict access to certain functions to a number of users, without implementing access control in your own code! But just using the ootb features of the JCR repository.

So this „bind to a resource type“ should remind you pretty much to the way, how resources and their components are wired. A resource has the property „resource type“, which denotes the component use to render this resource. With a servlet you can specify the resource type, your servlet wants to handle. So it’s basically the same, and instead of JSPs or Sightly scripts you can also use servlets to implement components. You can also easily implement the handling of selectors or different extensions in servlets.

I do not recommend to drop JSPs and Sightly altogether and switch to Servlets unless your fronted developers speak Java fluently, now and for the next years. Sightly has been developed just for this specific purpose: Frontend stuff should be handled by fronted developers and must not require java development knowhow. Use Sightly whenever possible.

And finally a bookmark for everyone working with Sling: The Sling servlets and scripts documentation.

AEM scaling patterns: Avoid shared sessions

The biggest change in AEM 6.0 compared to its prior versions is the use of Apache Oak as repository implementation instead of Apache Jackrabbit version 2.x; although both implement the JCR 2.0 API (Oak not completely yet, but the „important“ parts are there), there a number of differences between them.

In the area of scalability the most notable change is the use of the MVCC (multi version concurrency control, and proven approach taken from the relational database world) in Oak. It decouples sessions from the global repository state and are the basis for the scalability of the repository. But it comes with the price, that sessions should be used only by a single thread. It is a only a „should“, because Oak detects any usage of multiple threads accessing a single session and then serializes the access to it.

(For the records: The same recommendation already applied to Apache Jackrabbit 2.x, but the impact was never that high, mostly because it wasn’t that scalable as Oak now is.)

This isn’t a real limitation, but it requires careful design of any application. In the context of AEM normally it isn’t a problem at all, because all incoming HTTP requests use a dedicated session on their own. While this is true for the request, there is often functionality, which doesn’t follow this pattern.

I put a common pattern for this development pattern to Github, including a recommended implementation and a discouraged implementation. The problem in the discouraged example lies in the fact, that the repository session (in the example hidden behind the resource resolver abstraction) is opened once at the startup of the service by the thread, which does the activation of all services. But then resources are handed out to every other thread requesting the getConfiguration() method. If every request is doing this call, they all get synchronized here, thus limiting the scalability.

In the recommended example this problem is mitigated in a way, that each call to getConfiguration() opens a new session, reads the required resource and then closes the session. Here the session and its data is hold completely inside a thread, and there’s no need for synchronization anymore.

That’s the theory part, but how can you detect easily if you have this problem as well? The easiest way is to set the logging for the class org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate to DEBUG. Every time Oak detects the problem, that a session is used by multiple threads, it prints a stack trace to the log. If this happens on write access, it uses the WARN level, in case of reads the DEBUG level.

23.02.2015 09:21:56.916 *WARN* [0:0:0:0:0:0:0:1 [1424679716845] GET /content/geometrixx/en/services.html HTTP/1.0] org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate Attempt to perform hasProperty while another thread is concurrently reading from session-494. Blocking until the other thread is finished using this session. Please review your code to avoid concurrent use of a session.
java.lang.Exception: Stack trace of concurrent access to session-494
at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:276)
at org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:113)
at org.apache.jackrabbit.oak.jcr.session.NodeImpl.hasProperty(NodeImpl.java:812)
at org.apache.sling.jcr.resource.JcrPropertyMap.read(JcrPropertyMap.java:350)

If you want to have a scalable AEM application, you should carefully watch out for these log messages and optimize the use of shared sessions.