If you are a developer and need to work with databases, you often relay on the features your framework offers you to get your work done easily. Working directly with JDBC and SQL is not really comfortable, writing “SELECT something FROM table” with lots of constraints can be tedious …
The SQL language offers only the “select” statement to retrieve data from the database. JCR offers multiple ways to actually get access to a node:
Each of these methods serve for different purposes.
session.getNode(path)is used, when you know exactly the path of a node. That’s comparable to a “select * from table where path = “/content/geometrixx/en” in SQL, which is a direct lookup of a well-known node/row.
node.getNodes()returns all child nodes below the node. This method has no equivalent in the SQL world, because in JCR there are not only distinct and independent nodes, but nodes might have a hierarchical relation.
- The JCR search is the equivalent of the SQL query, it can return a set of nodes. Yes, ANSI SQL 92 is much more powerful, but let’s ignore that for this article, okay?
In ANSI SQL, approach 1 and 3 are both realized by a SELECT query, while the node.getNodes() approach has no direct equivalent. Of course it can also realized by a SELECT statement (likely resolving a 1:n relation), but it highly depends on the structure of your data.
In Jackrabbit (the reference implementation of the JCR standard and also the basis for CRX) all of these methods are implemented differently.
session.getPath(): It starts at the root node and drills down the tree. So to lookup /content/geometrixx/en the algorithm starts at the root, then looks up the node with the name “content”, then looks for a child node named “geometrixx” and then for a child node named “en”. This approach is quite efficient because each bundle (you can consider it as the implementation equivalent of a JCR node) references both its parent and all the child nodes. On every lookup the ACLs on that node are enforced. So even when a node “en” exists, but the session user does not have read access on it, it is not returned, and the algorithm stops.
node.getNodes is even more efficient, because it just has to lookup the bundles of the child node list and filter it by ACLs.
If you use the JCR search, the Lucene index is used to do the search lookup and the bundle information is used to construct the path information. The performance of this search depends on many factors, like (obviously) the amount of results returned from Lucene itself and the complexity of the other constraints.
So, as a CQ developer, you should be aware of that there are many ways to get hold of your data. And all have different performance behaviors.
In the next posting I will explain this case on a small example.