Category Archives: content structure

Sling Context-Aware Configuration (part 4) — using inheritance

Till now this series of blog posts has only worked with configuration which has been explicitly set. But when working in hierarchical systems (like the AEM content tree) it’s often useful when you can inherit things from a parent container/concept. That means, that a specific setting on a node has effect on a complete subtree below that node instead of just the single node.

And of course Inheritance works with CA-Config as well. For easier demonstration I updated the demo content at https://github.com/joerghoh/cqdump/tree/master/ca-config-example in a way, that we can edit the configuration at 3 different locations:

  • /contnet/ca-config/configuration.html
  • /content/ca-config/en/configuration.html
  • /content/ca-config/fr/configuration.html

We always use the WCM.io configuration editor I introduced in the last post.

For demo purposes I also created a new configuration “de.joerghoh.cqdump.caconfig.inheritance.InheritanceDemoConfig” (code). Be aware that the configuration editor only works when the Pojo classes for the configuration are properly annotated.

Ok, let’s get started.

Open the configuration at /contnet/ca-config/configuration.html, hit the “add” button at the top and select the “Configuration to demo the inheritance” configuration. Then change the value of the “aProperty” configuration, check the “enable property inheritance” box and press “Save”.

caconfig-inheritance-globallevel

You have created a configuration which should be inherited down to the other levels. Specifically we should be able to see it at /content/ca-config/en/configuration.html. So let’s check it.

caconfig-inheritance-locallevel

On a repository level only the property “sling:configPropertyInherit” has been added to /conf/ca-config/content/sling:configs/de.joerghoh.cqdump.caconfig.inheritance.InheritanceDemoConfig.

caconfig-inheritance-crxdeview

It is that simple to allow configuration inheritance. And there are many usecases where this can be useful.

If you look back, we can build this functionality also the InheritanceValueMap. But right now we only scratched the surface of the CA-Config functionality. But before we dig deeper into it, I will discuss some aspects of Sling CA-Config for production usecases.

Sling Context-Aware configuration (part 2)

Update (April 30th, 2018): I changed the name of the configuration item to align with with future development. Next to this change I also added a few bugfixes in the git repo.

In the last 2 blog (1,2) posts I talked about why a more content-oriented and content-centric way to store configuration is useful. And I also described 2 scenarios in which CA-Config can actually be used. Now I want to demonstrate how Sling Context-Aware Configuration can actually be used.

Let’s take the example from last blog post: For the approval workflow we want to configure the approval group externally via Sling CA-Config, so it can be configured per subtree/page. I use AEM 6.3 here, but it should work with 6.2 as well.

In the next steps I will guide you through the steps required to implement these steps. For reference the complete project is available at https://github.com/joerghoh/cqdump/tree/master/ca-config-example.

  • Create a structure in /conf where you would like to store your configuration. In my example the content lives in /content/ca-config with the subites “en” and “fr”, and for /conf I choose /conf/ca-config/content/.
  • Create a property “sling:configRef” with the value “/conf/ca-config/content” at /content/ca-config/jcr:content/. This links your content to the configuration.

In the CA-Config lingua you created a configuration bucket at /conf/ca-config/content, which is referenced by /content/ca-config.

caconfig-configRefs

And that’s it. Now every lookup through Sling CA-Config will lookup there.

Now let’s create some code which is using such a configuration. I created a DynamicParticipant WorkflowStep named “CaConfigParticipantStepChooser“, which assigns the workflow to a group defined by CA-Config. As you can see there a POJO class “CaConfigParticipantStepChooserConfig” is used to store the config read from CA-Config.

caconfig-wfstep

When you run this code, it will try to lookup the config property “approverGroup” (which is derived from the field name in CaConfigurationParticipantStepChooserConfig), but it returns the default value “admin”, as we haven’t provided any configuration value yet.

Now let’s assume that this workflow step, if it is executed on content below /content/ca-config, should assign the workflow to the group “content-authors“. Create a folder “/conf/ca-config/content/sling:configs” and add a node with the name “de.joerghoh.cqdump.caconfig.workflow.CaConfigParticipantStepChooserConfig“; add the property “approverGroup” and set it to the value “content-authors“.

caconfig-configsetting

To validate this setting there is a small tool in the OSGI webconsole.

  • Goto localhost:4502/system/console/slingcaconfig (in the navigation it’s “Sling” => “Context-Aware Configuration”)
  • enter “/content/ca-config” as content path
  • Enter “de.joerghoh.cqdump.caconfig.workflow.CaConfigParticipantStepChooserConfig” as “Other config name”.
  • And press “OK”.

 

It will display that that for this configuration name a property “approverGroup” with the value “content-authors” can be resolved. Which is exactly what we want.

When you test the workflow now, the workflow will be assigned correctly to the “content-authors” group.

To demonstrate the flexibility of context-aware configuration, let’s create a new folder /conf/ca-config/content/fr/sling:configs, add a node “de.joerghoh.cqdump.caconfig.workflow.CaConfigParticipantStepChooserConfig” and add a property “approverGroup” with the value “contributors“. Additionally add the property”sling:configRef” with the value “/conf/ca-config/content/fr” to the /content/ca-config/fr page. When you start the workflow now on /content/ca-config/fr the workflow will be assigned to the contributors group. And of course this also works for every page below /content/ca-config/fr.

(An important remark here: The paths are choosen arbitrarily. Especially in /conf you are very flexible, and the structure there does not necessarily need to match the structure below /content. I just choose the same structure for a better understanding of this example, and for convenience reason I recommend you to do the same.)

Congratulations, you are now able to use Sling Context-Aware Configuration! You can now freely package /content to move it to another stage, and the per-stage approver groups do not need to be re-adjusted.

But to be honest, this is a very simple way, and it has some drawbacks:

  • You need to use CRXDE to maintain the properties in /conf.
  • The need to create sling:configRef nodes for all buckets.

Let’s try to address these topics another time, in the next blog post.

 

 

 

Sling Context-Aware configuration

In the previous blog posting I outlined why there’s a need to store some configuration outside of OSGI and I named Sling Context-Aware Configuration (CA-Config) as a way to implement it.

CA-Config is a relative new feature of AEM (introduced with AEM 6.2) which stores the configuration inside the repository; but besides that it offers some features, which are handy in a lot of usecases:

  • You don’t deal with nodes and resources anymore, but only with POJOs. This kind of implementation completly hides where the configuration is stored, but the location of the config is stored as property on the node.
  • CA-Config implements an approach based on the repository hierarchy. If a configuration value is not found at the specified config location, the hierarchy is walked towards the root and further configurations are consulted, until a configuration is found or the root node is hit (in that case a default value for the configuration should be used).
  • the lookup process is very configurable, although the default one is sufficient in most cases.

A good example for this are content-related settings in a multi-tenant system. Imagine a multi-country and multi-language use-case (where for some countries multiple languages have to be supported); and in each country you should have different user groups, which act as approver in the the content activation process. In the past this was often done by storing it as part of the pages itself, but from a permission point of view it was possible for the authors to change it themselves (which typically only causes trouble). But with CA-Config this can be stored outside of the /content tree in /conf, but closely aligned to the content, and being able to match the weirdest requirements from a structural point of view; that means it is easily possible then have in one country 3 different approver groups for sections of the page, while in another just having 1 group covering all.

Another large usecase are sites, which are based on MSM livecopies. There is always content which is site-specific (e.g. contact adresses, names, logos, etc). And if this content is baked into a”regular” content page which is rolled out by MSM, the blueprint content will always overwrite the content which is specific to the livecopy. With CA-config it is easy possible to maintain these site-specific configurations outside of the regular content tree.

Before CA-Config all of this was implemented for very simple usecases using the InheritanceValueMap, which also implemented such a “walk-up-the-tree-if-the-property-is-not-found” algorithm. But it only looked up the direct parents and did not consider more structured property data as it is possible with CA-Config.

And there were a lot of custom implementations, which created “per site” configuration pages, typically next to the “homepage” page, and restricted access to it via ACLs or dispatcher/webserver/mapping rules. And of course these continue to work, but the access to these configuration data could be managed via the CA-Config as well (depending on the way how the data is stored either by just specifiying these as config path or by a custom lookup logic which can be configured for the CA-Config). And on top of that, these per-site configurations were per-site and not supporting the above mentioned case of having 3 different approver on different parts of the site.

So CA-Config is very flexible and efficient way to retrieve configuration data, which is stored in the repository. But it’s just for retrieving, there’s no prescribed way how the configuration data gets there.
While AEM comes with a configuration editor, it does not support writing the settings for CA-Config (at the moment it’s only designed for editable templates and workflows). WCM.io has a content package for download which allows you to edit these configurations in the authoring UI.

In the next post I will outline how easy it is to actually use CA-Config.

Creating the content architecture with AEM

In the last post I tried to describe the difference between the information architecture and content architecture; and from an architectural point of the view the content architecture is quite important, because based on that your application design will emerge. But how can you get to a stable and well-thought content structure?

Well, there’s no bullet-proof approach for it. When you design the content architecture for an AEM-based application it’s best to have some experience with the hierarchical approach offered by the repository approach. I will try to outline a process which might help you to get you there.
It’s not a definite guideline and I will never guarantee that it will work for you, as it is just based on my experience with the projects I did. But I hope that it will give some input and can act as a kind of checklist for you. My colleague Alex Klimetschek did a presentation at the adaptTo() conference 2012 about it.

The tree

But before we start, I want to remind you of the fact, that everything you do has to fit into the JCR tree. This tree is typically a big help, because we often think in trees (think of decision trees, divide-and-conquer algorithms, etc), also the URL is organized in a tree-ish way. Many people in IT are familiar with the hierarchical way filesystems are organized, so it’s both an comfortable and easy-to-explain approach.

Of course there are cases, where it makes things hard to model; but you are hit that problem, you should try to choose a different approach. Building any n:m relation in the AEM content tree is counter-intuitive, hard to implement and typically not really performant.

Start with the navigation

Coming from the information architecture you typically have some idea, how the navigation in the site should look like. In the typical AEM-based site, the navigation is based on the content tree; that means that traversing the first 2-3 levels of your site tree will create the navigation (tree). If you map it the other way around, you can get from the navigation to the site tree as well.

This definition definitivly has impact on your site, as now the navigation is tied to your content structure; changing one without the other is hard. So make your decision carefully.

Consider content-reuse

As the next step consider the parts of the website, which have to be identical, e.g. header and footer. You should organize your content in a way, that these central parts are maintained once for the whole site. And that any change on them can be inherited down the content tree. When you choose this approach, it’s also very easy to implement a feature, which allows you to change that content at every level, and inherit the changed content down the tree, effectively breaking the inheritance at this point.

If you are this level, also consider the fact of dispatcher invalidation. Whenever you change such a “centralized” content, it should be easily possible to purge the dispatcher cache; in the best case the activation of the changed content will trigger the invalidation of all affected pages (not more!), assuming that you have your /statefilelevel parameter set correctly.

Consider access control

As third step let’s consider the already existing structure under the aspect of access control, which you will need on the authoring environment.
On smaller sites this topic isn’t that important, because you have only a single content team, which maintains all the page. But especially in larger organizations you have multiple teams, and each team is responsible for dedicated parts of the site.

When you design your content structure, overlay the content structure with these authoring teams, and make sure, that you can avoid any situation, where a principal has write access to a page, but not to any of the child pages. While this is not always possible, try to follow this guidelines regarding access control:

  • When looking from the root node in the tree to node on a lower level, always add more privileges, but do not remove them.
  • Every author for that site should have read access to the whole site.

If you have a very complicated ACL setup (and you’ve already failed to make it simpler), consider to change your content structure at this point, and give the ACL setup a higher focus than for example the navigation.

My advice at this point: Try to make your ACL setup very easy; the more complex it gets the more time you will spend in debugging your group and permission setup to find out, what’s going on in a certain situation; also the harder it will be to explain it to your authors.

Multi-Site with MSM

As you went now through these 3 steps, you are through with it and already have some idea how your final content structure needs to look like. There is another layer of complexity if you need to maintain multiple sites using the multi-site-manager (MSM). The MSM allows you to inherit content and content structure to another site, which is typically located in a parallel sub-tree of the overall content tree. Choosing the MSM will keep your content structures consistent, which also means, that you need to plan and setup your content master (in MSM terms it is called the blueprint) in a way, that the resulting structure is well-suited for all copies of it (in MSM: live copies).

And on top of the MSM you can add more specifics, features and requirements, which also influence the content structure of your site. But let’s finish here for the moment.

When you are done with all these exercises, you already have a solid basis and considered a lot of relevant aspects. Nevertheless you should still ask others for a second opinion. Scrutiny pays really off here, because you are likely to live with this structure for a while.

Ways to access your content with JCR (part 1)

If you are a developer and need to work with databases, you often relay on the features your framework offers you to get your work done easily. Working directly with JDBC and SQL is not really comfortable, writing “SELECT something FROM table” with lots of constraints can be tedious …

The SQL language offers only the “select” statement to retrieve data from the database. JCR offers multiple ways to actually get access to a node:

Each of these methods serve for different purposes.

  • session.getNode(path) is used, when you know exactly the path of a node. That’s comparable to a “select * from table where path = “/content/geometrixx/en” in SQL, which is a direct lookup of a well-known node/row.
  • node.getNodes() returns all child nodes below the node. This method has no equivalent in the SQL world, because in JCR there are not only distinct and independent nodes, but nodes might have a hierarchical relation.
  • The JCR search is the equivalent of the SQL query, it can return a set of nodes. Yes, ANSI SQL 92 is much more powerful, but let’s ignore that for this article, okay?

In ANSI SQL, approach 1 and 3 are both realized by a SELECT query, while the node.getNodes() approach has no direct equivalent. Of course it can also realized by a SELECT statement (likely resolving a 1:n relation), but it highly depends on the structure of your data.

In Jackrabbit (the reference implementation of the JCR standard and also the basis for CRX) all of these methods are implemented differently.

session.getPath(): It starts at the root node and drills down the tree. So to lookup /content/geometrixx/en the algorithm starts at the root, then looks up the node with the name “content”, then looks for a child node named “geometrixx” and then for a child node named “en”. This approach is quite efficient because each bundle (you can consider it as the implementation equivalent of a JCR node) references both its parent and all the child nodes. On every lookup the ACLs on that node are enforced. So even when a node “en” exists, but the session user does not have read access on it, it is not returned, and the algorithm stops.

node.getNodes is even more efficient, because it just has to lookup the bundles of the child node list and filter it by ACLs.

If you use the JCR search, the Lucene index is used to do the search lookup and the  bundle information is used to construct the path information. The performance of this search depends on many factors, like (obviously) the amount of results returned from Lucene itself and the complexity of the other constraints.

So, as a CQ developer, you should be aware of that there are many ways to get hold of your data. And all have different performance behaviors.

In the next posting I will explain this case on a small example.

User administration on multi-client-installations

Developing an application for a multi-client-installation isn’t only a technical or engineering quest, but also reveals some question, which affect administration and organisationial processes.

To ease administration, the user accounts in CQ are often organized in a hiearchy, so that users which are placed higher in the hierarchy, can administrate user which are lower in the hierarchy tree below them. Using this mechanism a administrator can easily delegate the administration of certain users to other users, which can also do adminstrative works for “their” users.

The problem arises when a user has to have rights in 2 applications within the same CQ instance and every application should have its own “application administrator” (a child node to the superuser user). Then this kind of administration is no longer possible, because it is impossible to model a hierarchy where neither application administrator user A has a parent or child relation to application administration user B nor A and B are placed in the hierarch higher than any user C.

I assume that creating accounts for different application but the same person isn’t feasible. That would be the solution which the easiest one from an engineering point of view, but this does contradict the ongoing move not to create for each application and each user a new user/password pair (single sign on).

This problem imposes the burden of user administration (e.g assigning users to groups, resetting passwords) to the superuser, because the superuser is the user, which is always (either by transition or directly) parent to any user. (A non-CQ-based solution would be to handle user related changes like password set/reset and group assignment outside of CQ and synchronize these data then into CQ, e.g. by using a directory system based on LDAP.)

ACLs, access to templates and workflows should be assigned only using groups and roles, because these can be created per application. So if an application currently is based on a user hierarchy and individual user rights it’s hard to add a new application using the same user.

So one must make sure, that all assignments are only based on groups and roles, which are created per application. Assigning individual rights to a single user isn’t the way to go.

Is CRX 1.4.2 production ready?

The contentbus technology was the standard storage backend till the CQ 4.x-series Although file-based storage wasn’t the great deal even in the late 1990s (mysql was already invented, postgres existed plus at least half a dozen enterprise database systems), Day choose to store the content objects in individual files, hidden by a abstraction layer. Of course it took some time of tuning and making experiences, but the contentbus proved to be a reliable storage which had the big point, that with an editor on the filesystem you can solve nearly all problems (we used more than once sed to fix our default.map)

But some points were still open:

  • Online backup isn’t possible. The documentation simply states: “Shutdown the CQ, copy your files, and startup again”. Although you can speed up the copy, if you replace with it with a snapshot on filesystem layer, but this need to restart doesn’t make it enterprise-ready. (Databases offer online-backup since at least a decade).
  • Contentbus requires a lot of filesystem I/O (mainly the system calls open and close). Having a lot of these operations slows down the processing. A small number of larger files would reduce this administrative overhead in the filesystem.
  • Memory usage of contentbus artifacts: Some artifacts like the default.map and zombie.map have in-memory data-structures, which grow as the underlying files grow (or vice-versa). The more content you have the more memory is used. Even if only a small part of this content is in active use. This doesn’t scale well.
  • The contentbus offers cluster support, but only with 2 nodes; with more nodes the overall performance will even degrade! According to the cluster documentation for CRX 1.4, Day tested CRX in a clustered setup with 6 nodes. If the performance loss is acceptable (that means,  6 nodes offer more performance than 5 nodes), this would be a real good solution to scale your authoring systems.

So we decided that’s time to evaluate if CRX would be at least as good as the contentbus. The TAR persistence manager adresses mainly the backup issue, we hope that we get some performance improvements as well.

So currently I’m doing a test setup of CQ 4.2.0 and CRX 1.4.2, for which Day offered (just in time :-)) a knowledge base article.