Detecting JCR session leaks

A problem I encounter every now and then are leaking JCR sessions; that means that JCR sessions are opened, but never closed, but just abandoned. Like Files, JCR sessions need to be closed, otherwise their memory is not freed and they cannot be garbage collected by the JVM. Depending on the number of sessions you leave in that state this can lead to serious memory problems, ultimately leading to a crash of the JVM because of an OutOfMemory situation.

(And just to be on the safe side: In AEM ootb all ResourceResolvers use a JCR session internally; that means whatever I just said about JCR sessions applies the same way to Sling ResourceResolvers.)

I dealt with this topic already a few times (and always recommended to close the JCR sessions), but today I want to focus how you can easily find out if you are affected by this problem.

We use the fact that for every open session an mbean is registered. Whenever you see such a statement in your log:

14.08.2018 00:00:05.107 *INFO* [oak-repository-executor-1] com.adobe.granite.repository Service [80622, [org.apache.jackrabbit.oak.api.jmx.SessionMBean]] ServiceEvent REGISTERED

That’s says that an mbean service is registered for a JCR session; thus a JCR session has been opened. And of course there’s a corresponding message for unregistering:

14.08.2018 12:02:54.379 *INFO* [Apache Sling Resource Resolver Finalizer Thread] com.adobe.granite.repository Service [239851, [org.apache.jackrabbit.oak.api.jmx.SessionMBean]] ServiceEvent UNREGISTERING

So it’s very easy to find out if you don’t have a memory leak because of leaking JCR sessions: The number of log statements for registration of these mbeans must match the number of log statements for unregistration.

In many cases you probably don’t have exact matches. But that’s not a big problem if you consider:

  • On AEM startup a lot of sessions are opened and JCR observation listeners are registered to them. That means that a logfile with AEM starts and stops (and the number of starts do not match the number of stops) it’s very likely that these numbers do not match. Not a problem.
  • The registration (and also the unregistration) of these mbeans often happens in batches; if this happen during logfile rotation, you might have an imbalance, too. Again, not per se a problem.

It’s getting a problem, if the number of sessions opened is always bigger than the number of sessions closed over the course of a few days.

$ grep 'org.apache.jackrabbit.oak.api.jmx.SessionMBean' error.log | grep "ServiceEvent REGISTERED" | wc -l
212123
$ grep 'org.apache.jackrabbit.oak.api.jmx.SessionMBean' error.log | grep "ServiceEvent UNREGISTERING" | wc -l
1610
$

Here I just have the log data of a single day, and it’s very obvious, that there is a problem, as around 220k sessions are opened but never closed. On a single day!

To estimate the effect of this, we need to consider that for every of these log statements these objects are retained:

  • A JCR session (plus objects it reaches, and depending on the activities happening in this session it might also include any pending change, which will never going to be persisted)
  • A Mbean (referencing this session)

So if we assume that 1kb of memory is associated with every leaking session (and that’s probably an very optimistic assumption), this would mean that the system above would loose around 220M of heap memory every day. This system probably requires a restart every few days.

How can we find out what is causing this memory leak? Here it helps, that Oak stores the stack trace when opening sesions as part of the session object. Since around Oak 1.4 it’s only done if the number of open sessions exceeds 1000; you can tune this value with the system property “oak.sessionStats.initStackTraceThreshold”; set it to the appropriate value. This is a great help to find out where the session is opened.

And then go to /system/console/jmx, check for the “SessionStatistics” mbeans (typically quite at the bottom of the list) and select on the most recent ones (they have the openening date already in the name)

session information in the mbean view

session information in the mbean view

And then you can find in the “initStackTrace” the trace where this session has been opened:

Stacktrace of an open JCR session

Stacktrace of an open JCR session

With the information at hand where the session has been opened it should be obvious for you to find the right spot where to close the session.
If you spot a place where a session is opened in AEM product code but never closed, please check that with Adobe support. But be aware, that during system startup sessions are opened and will stay open while the system is running. That’s not a problem at all, and please do not report them!

It’s only a problem if you have a at least a few hundreds session open with the very same stack trace, that’s a good indication of such a “leaking session” problem.

A good followup reading on AEM HelpX pages with some details how you can fix it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.