Prevent workflow launchers from starting a workflow

Workflow launchers are the standard way to trigger workflows based on changes in the content respository. The most prominent workflow which is triggered that way is the “Asset Update Workflow”, which does all the heavy lifting regarding asset processing. And it’s important to note that this workflow is executed on all changes to an asset itself, its renditions or on metadata.

But often this is not required. If you add more or custom meta data to an asset or even do it in a batch mode, you don’t want to this workflow to run at all; these metatadate changes are not relevant to assets themselves, but just to the way they should be handled in the specific context of your application.

The typical way to make the workflow not to start is to disable the workflow launcher (setting the “enabled” flag to “false”). But this is a global setting which affects all possible invocations, that means also the regular ingestion; and in that case the workflow has to run. So you need a way to specifically disable the workflow to start.

Fortunately there are a few ways how to achieve that, if you have the code under control, which performs the changes, and after which you don’t want the workflow to start again. This is key, because there is a feature available in the workflow launcher (sidenote: I just found that it has been documented; so it often makes sense to check documentation if there have been updates).

You can configure on the workflow launcher an exclusion property in the format “event-user-data:randomString”; this ignores all changes made by a JCR session which has a user-property “randomString” set.

How can you set that property? That’s quite easy:

Session session = ...;
session.getWorkspace().getObservationManager().setUserData("randomString");
// do you work with the session
session.save();

And by default the “Asset Update Workflow” is configured with “event-user-data:changedByWorkflowProcess”, so if your batch asset-operation sets the user-data to this string “changedByWorkflowProcess”, the “Asset Update Workflow” is not triggered anymore, without disabling the workflow launcher for it.

That’s it. And if you ever wanted to channel data from a saving session to the process which handles the observation events for it (the workflow launchers are just a very convenient way around the JCR Observation API): Just use event.getUserData().

AEM transaction size or “do a save every 1000 nodes”

An old rule of thumb, even on earlier versions of CQ5, is “when you do large repository operations, do a session.save() every 1000 nodes”. The justification for this typically, that this is the default of the Package Manager, and therefor it’s a kind of recommended approach. And to be honest, I don’t know the real reason for it, even though I work in the Day/Adobe ecosystem for quite some time.

But nevertheless, with Oak the situation has changed a bit. Limits are much more explicit, and this rule of “every 1000 nodes do a save” can be considered still as true statement. But let me give you some background on it, why this exists at all. And then let’s find out, if this rule is still safe to use.

In the JCR specification there is the concept of transient space. This transient space holds all activities performed on a session until an implicit or explicit save() of the session. So the transient space holds all temporary data of a transaction, and the save() is comparable to the final commit of a transaction.

This transient space is typically hold inside the java heap, so dealing with it is fast.

But by definition this transaction is not bound in terms of size. So technically you should be able to create sessions, which modify all nodes and every property of a repository of 2 TB size.  This transient space does not fit into heap of a standard size (say: 12GB) any more. In order to support this behavior nevertheless, Oak starts to move this transient space entirely into the storage (TarMK, Mongo) if the transient space is getting too large (in the DocumentNodeStore language this is called a “persistent branch”, see the documentation of the DocumentNodeStore for some details on branches); then the size of the transaction is only limited by the amount of free storage on the persistance, but no longer by the size of the Java heap.

The limit is called update.limit and by default this 10k (up to and including Oak 1.4/AEM 6.2, 100k starting with Oak 1.6/AEM 6.3, see OAK-3036. But of course you can change this value using “-Doak.update.limit=40000”.

This value describes the amount of changes a transient space for a single session can hold before it is moved into the persistence. A change is a any change to a property or a node (adding/removing/modifying/reordering/…).

OK, that’s the theory, but what does this mean for you?

First, if the transient space is swapped to the persistence, the final session.save() will take much longer compared to a transient space in memory. Because to do the save, the transient space needs to be read from the persistence first (which typically includes at least disk I/O, in cases of MongoDB network I/O, which is even slower).

And second, when you add or change nodes, you typical deal with properties as well. So if you are on AEM 6.2 or older, you should check that you don’t do too much changes within a session, so you don’t hit this “10’000 changes” limit and get the performance penalty. If you have a reasonable content structure, the above mentioned rule of thumb of “do a save every 1000 nodes” goes into the very right direction.

But that’s often not good enough, because the updates of the synchronous Oak indexes count towards the 10’000 changes as well. You might know, that the synchronous indexes mirror the JCR tree, thus adding 1000 JCR nodes will also add 1000 oak nodes for the nodetype index. And that’s not the only synchronous index…

Thus increasing the update.limit to a higher number makes pretty much sense just to be on the safe side. But there is a drawback when you have such large limits: It’s the size of the transient space. Imagine you upload 1000 assets (1 MB each) into your repository in a single session, and you have the update.limit set to 100’000. The number of changes will not reach the update.limit, that’s unlikey. But your transient space will consume 1 GB of heap at least! Is your system designed and setup to handle this? Do you have enough free JVM heap?

Let’s conclude: The rule of thumb “do a save every 1000 nodes” might be a bit too optimistic on AEM 6.2 and older (with default values), but ok on AEM 6.3. But always keep the amount of transient space in mind. It can overflow your heap and debugging out-of-memory situations is not nice.

If you are interested in the inner working of Oak, look at this great piece of documentation. It covers a lot of lowlevel concepts, which are useful to know when you deal with the repository more often.

JCR Observation in clustered AEM instances

Clustering AEM got a bit different with the introduction of OAK. But with the enforcement of the MVCC model in Oak I also advise to revisit some patterns you might got used to. Because some code which worked with no apparent problem in AEM 5.x might cause problems now.

One thing I would check are the JCR Observation Listeners. Using JCR observation is a common way to react on changes in the repository and this is common pattern since CQ 5.0. So what’s the problem with that? The problem is that many JCR observation handlers are not written with clustering in mind.

Take the example that you need to react on changes in the repository and in turn modify something else. The usual approach is to have a service like this (omitting a lot of the boilerplate …)

public class MyListener implements EventListener {

 @Activate
 protected void activate() {
  ...
  ObservationManager om = session.getWorkspace().getObservationManager();
  om.addEventListener (this, 
   Event.NODE_ADDED,
   "/content/mysite",
   null,
   new String[]{"cq:Page"},
   true,
   true);
  ...
 }

 public onEvent (EventIterator events) {
  // iterate through the events and change something in the repository.
 }

}

This works very well in any non-clustered environment, because there is only a single event handler performing these changes. In clustered environments the situation is different, because now on each cluster node there is such a event handler active. And each one wants to perform the repository changes.
In that case you’ll see a lot of Oak exceptions (on all cluster nodes) which indicate that nodes have been modified externally (outside of the current session) and that a merge was not possible. This is because the changes happen in (quasi-) parallel, but not visible to the currently open sessions, thus causing these exceptions.

The only solution to this problem is to execute the EventListener only on a single node or to handle every event by exactly one event handler and not on all.

Handling every observation event on exactly handler is the elegant and scalable solution. The idea is to handle on every cluster node only the changes which happen on this cluster nodes („local events“). While the JCR API doesn’t have any notion of cluster and the Observation API does not give any information if a event is local or not, the Jackrabbit implementation (which Oak is using here) supports this through the JackrabbitObservationManager. As you can see in the following snippet, only the registration of the ObservationHandler changes, but not the handler itself.

public class MyScalableListener implements EventListener {

 @Activate
 protected void activate() {
  ...
  JackrabbitEventFilter ef = new JackrabbitEventFilter()
   .setAbsPath("/content/mysite")
   .setNodeTypes(new String[{"cq:Page"})
   .setEventTypes(Event.NODE_ADDED)
   .setIsDeep(true)
   .setNoExternal(true);
  JackrabbitObservationManager om = (JackrabbitObservationManager) session.getWorkspace().getObservationManager();
  om.addEventListener (this, ef);
  ...
 }

 public onEvent (EventIterator events) {
  // iterate through the events and change something in the repository.
 }
}

Through the Jackrabbit API extension you can register you EventListener to only handle local changes only and ignore any external ones, which are generated on another cluster nodes (using the setNoExternal(true) call). This is a scalable solution because the events handled at the location where they are generated, and no cluster nodes gets a bottleneck because of this.

So whenever you write an ObservationHandler and especially when you use a cluster, you should review your code and make sure, that you avoid concurrent access to the same resource. Of course there are many ways to have concurrent access even without clustering, but when you actually use clustering, the JCR observation handlers are the easiest piece of code to check and fix.