1000 nodes per folder and Oak orderable nodes

Every now and then there’s the question, how many child nodes are supported in JCR. While the technical correct answer is „there is no limit“, in practice there are some limitations.

In CRX 2.x the nodes are always ordered. In CRX 2.x even unordered nodes are treated as if they are ordered, which made the difference nearly to non-existent. [Thanks Justin for making this clear!] This means, that the order needs to be maintained on all operations, including add and remove of sibling nodes. The more child nodes a node has, the more time it takes to maintain this list.

So, what’s about this „1000 child nodes“ limit? First of all, this number is arbitrary 🙂 But when you use CRXDE Lite, it’s getting really slow to browse a node with lots of child nodes, mostly because of the time it takes the Javascript to render it. But of course also the performance of add and remove operations degrade linearly. Also you don’t have hardly cases where you would have more than 1000 child nodes.

But for the aspect of reading nodes there is no impact on performance. So it is not a problem to have 6000 nodes in /libs/wcm/core/i18n/en, because you only read the nodes, but you don’t change them.

But nevertheless this „limit“ can be cumbersome, especially if you don’t need to the feature of ordered child nodes. Also the fact that there is this limit means, that adding you have the impact (at a a lower level) also already with less nodes.

With Apache Oak this has changed. With Oak nodes are not ordered unless its parent has node type which supports ordering.

To illiustrate the difference between sling:folder and sling:orderedFolder; i did a small test. I wrote a small benchmark to create 5000 nodes, then add more nodes, do random reads and delete them afterwards. For every operation a single node is created or deleted followed by a save(). (Sourcecode)

Operation sling:Folder sling:OrderedFolder
Create 5000 nodes 6124 ms 17129 ms
Random read 500 nodes 2 ms 9 ms
Add 500 nodes 112 ms 564 ms

This small benchmark (executed on 2014 Macbook pro with SSD, AEM 6.0, TarMK, Oak 1.0.0) shows:

  • Adding lots of child nodes to a node is much faster when you using a non-ordering nodetype
  • Also random read is faster, obviously Oak can use more efficient data structures than a list, if it doesn’t need to maintain the ordering.

The factor of 3-4 is obviously quite significant. Of course the benefit is smaller if you have less child nodes.

Advertisements

4 thoughts on “1000 nodes per folder and Oak orderable nodes

  1. Justin Edelson (@justinedelson)

    Great post Joerg. One thing I do want to point out is that CRX2 did in fact support unordered child nodes from the perspective of the JCR spec, meaning that Node.orderBefore() would throw an exception for chlidren of, say, sling:Folder.

    The difference between CRX2 and Oak is entirely with respect to the *performance* implications of unordered child nodes. With CRX2, you essentially paid the penalty for ordered child nodes even for an unordered parent. In other words, adding 5000 child nodes to a sling:OrderedFolder would take essentially the same amount of time as adding 5000 child nodes to a sling:Folder. This is (as you demonstrate here) not the case with Oak.

    One other note which is very important is that the nt:unstructured node type (which is probably the most popular node type) has ordered child nodes. This is true in both CRX2 and Oak as it is defined in the JCR spec. Oak introduced a new node type oak:unstructured specifically to handle the case where you want nt:unstructured without ordered child nodes.

    1. varsh

      Few more information requested on the above subject with comparison with CRX 2.2 & CRX 3
      In Both CRX 2.2 & CRX 3 (OAK) rep:authorizableFolder is the parent node for users & groups

      /home/users & /home/groups

      As there is no nt:unstructured (CRX 2.2) , oak:unstructured (CRX 3 OAK) as parent to rep:authorizableFolder how this impact the user node scale in CRX 2.2 & CRX 3

      Overall trying the understand how the OAK million node goal is applied to /home/user node or it is only a custom application and only if we create a oak:unstructured.

      Does this means in both TarPM CRX 2.2 & OAK TarMK CRX3 CAN we have more than 100K end users possibly scaling to millions within single publish instance and all the users are stored under /home/users/. Under different bucket based on OOTB user allocation like under a/b/c/d folders etc.

      Can we also maintain user preferences as well ? overall I see the end user getting created also as a user generated content restricted to user & his entities (user model).

      Checking if maintaining this amount of user & his preferences would allow space for AEM WCMS for portal like requirement.Also considering to avoid multi publish sync up and single consistence view of user data

      1) Should we consider using MongoDB for this kind of use case – AEM User UI might be used possible large user base UI issues. Also as per AEM docs MongoDB is recommended only for communities , Is portal with large user base as above in MongoMK a valid use case for AEM
      or
      2) Maintain the user & preference out side AEM in a separate RDBMS – But AEM User UI will not work for jcr authorization

      1. Jörg Post author

        Hi,

        I do not advise to manage such large user bases on a single level in the repository. You should split early and bucketize it. Also the ootb user management tools are not applicable anymore when you want to deal with millions of users, because AEM is not an identity management system in the first place.

        The question to use MongoDB as backend for the repository is unrelated to this question, but managing such large user bases might lead to a decision to use MongoDB.

        kind regards,
        Jörg

Comments are closed.