AEM anti-pattern: The hardcoded content structure

One the first things I usually do when we start an AEM project is to get a clear vision of the content and its structure. We normally draw a number of graphs, discuss a number of use cases, and in the end we come up with a content structure, which satisfies the requirements. Then we implement this structure as a hierarchy of nodes and that’s it.

In many cases developers start to use this structure without too much thinking. They assume, that the node structure is always like this. They even start to hardcode paths and language names or mimic this structure. Sometimes that’s not a problem. But it is getting hard, when you are building a multi-language or multi-tenant site and you start simple with only 1 language and 1 tenant; then you might end up with these languages or tenants being hardcoded, as “there was no time to make it right”. Imagine when you start with the second language or the second site and someone hardcoded a language or a site name/path.

So, what can you do to avoid hardcoded paths? Some information is always stored at certain areas. For example you can store basic contact information on the root node, which you can reuse on the whole site. So how do you identify the correct root node if you have multiple sites? Or how do you identify the language of the site?

The easiest way is to mark these site root pages (I prefer pages here over nodes, as they can be created using the authoring UI and are much more easier authorable) with a certain property and value. The easiest way is then if you have a special template with its dedicated resource type. Then you can identify these root pages using 2 approaches:

  • When you need to find them all, use a JCR query and look for all pages with this specific resource type.
  • When you need to find the siteroot page for a given page (or resource), just iterate up the hierarchy until you find a page with this resource type.

This mechanism allows you to be very flexible in terms of the content hierarchy. You no longer depend on pages being on a certain level or having special names. It’s all dynamic and you don’t have any dependency on the content structure. This page doesn’t even have to be the root-page of the public facing site, but is just a configuration page used for administration and configuration purposes. The real root-page can be a child or grand-child of it. You have lot’s of choices then.

But wait, there is a single limitation: Every site must have a sitters page using this special template/resourcetype. But that isn’t a hard restriction, isn’t it?

And remember: Never do string operations on a content path to determine something, neither the language nor the site name. Never.