Category Archives: architecture

Creating the content architecture with AEM

In the last post I tried to describe the difference between the information architecture and content architecture; and from an architectural point of the view the content architecture is quite important, because based on that your application design will emerge. But how can you get to a stable and well-thought content structure?

Well, there’s no bullet-proof approach for it. When you design the content architecture for an AEM-based application it’s best to have some experience with the hierarchical approach offered by the repository approach. I will try to outline a process which might help you to get you there.
It’s not a definite guideline and I will never guarantee that it will work for you, as it is just based on my experience with the projects I did. But I hope that it will give some input and can act as a kind of checklist for you. My colleague Alex Klimetschek did a presentation at the adaptTo() conference 2012 about it.

The tree

But before we start, I want to remind you of the fact, that everything you do has to fit into the JCR tree. This tree is typically a big help, because we often think in trees (think of decision trees, divide-and-conquer algorithms, etc), also the URL is organized in a tree-ish way. Many people in IT are familiar with the hierarchical way filesystems are organized, so it’s both an comfortable and easy-to-explain approach.

Of course there are cases, where it makes things hard to model; but you are hit that problem, you should try to choose a different approach. Building any n:m relation in the AEM content tree is counter-intuitive, hard to implement and typically not really performant.

Start with the navigation

Coming from the information architecture you typically have some idea, how the navigation in the site should look like. In the typical AEM-based site, the navigation is based on the content tree; that means that traversing the first 2-3 levels of your site tree will create the navigation (tree). If you map it the other way around, you can get from the navigation to the site tree as well.

This definition definitivly has impact on your site, as now the navigation is tied to your content structure; changing one without the other is hard. So make your decision carefully.

Consider content-reuse

As the next step consider the parts of the website, which have to be identical, e.g. header and footer. You should organize your content in a way, that these central parts are maintained once for the whole site. And that any change on them can be inherited down the content tree. When you choose this approach, it’s also very easy to implement a feature, which allows you to change that content at every level, and inherit the changed content down the tree, effectively breaking the inheritance at this point.

If you are this level, also consider the fact of dispatcher invalidation. Whenever you change such a “centralized” content, it should be easily possible to purge the dispatcher cache; in the best case the activation of the changed content will trigger the invalidation of all affected pages (not more!), assuming that you have your /statefilelevel parameter set correctly.

Consider access control

As third step let’s consider the already existing structure under the aspect of access control, which you will need on the authoring environment.
On smaller sites this topic isn’t that important, because you have only a single content team, which maintains all the page. But especially in larger organizations you have multiple teams, and each team is responsible for dedicated parts of the site.

When you design your content structure, overlay the content structure with these authoring teams, and make sure, that you can avoid any situation, where a principal has write access to a page, but not to any of the child pages. While this is not always possible, try to follow this guidelines regarding access control:

  • When looking from the root node in the tree to node on a lower level, always add more privileges, but do not remove them.
  • Every author for that site should have read access to the whole site.

If you have a very complicated ACL setup (and you’ve already failed to make it simpler), consider to change your content structure at this point, and give the ACL setup a higher focus than for example the navigation.

My advice at this point: Try to make your ACL setup very easy; the more complex it gets the more time you will spend in debugging your group and permission setup to find out, what’s going on in a certain situation; also the harder it will be to explain it to your authors.

Multi-Site with MSM

As you went now through these 3 steps, you are through with it and already have some idea how your final content structure needs to look like. There is another layer of complexity if you need to maintain multiple sites using the multi-site-manager (MSM). The MSM allows you to inherit content and content structure to another site, which is typically located in a parallel sub-tree of the overall content tree. Choosing the MSM will keep your content structures consistent, which also means, that you need to plan and setup your content master (in MSM terms it is called the blueprint) in a way, that the resulting structure is well-suited for all copies of it (in MSM: live copies).

And on top of the MSM you can add more specifics, features and requirements, which also influence the content structure of your site. But let’s finish here for the moment.

When you are done with all these exercises, you already have a solid basis and considered a lot of relevant aspects. Nevertheless you should still ask others for a second opinion. Scrutiny pays really off here, because you are likely to live with this structure for a while.

Information architecture & content architecture

Recently I had a discussion in the AEM forums about how to reuse content. During this discussion I was reminded again at the importance of the way how you structure content in your repository.

For this often the term “information architecture” is used, but from my point of view that’s not 100% correct. Information architecture handles the various aspect how your website itself is structured (in terms of navigation, layout but also content). It’s most important aspect is the efficient navigation and consumption of the content on the website by end users (see the wikipedia article for it, ). But it doesn’t care about aspects like content reuse (“where do I maintain the footer navigation”), relations between websites (“how can I reduce work to maintain similar sites”), translations or access control for the editors of these systems.

Therefor I want to introduce the term “content architecture“, which deals with questions like that. The information architecture has a lot of influence, but it’s solely focused on the resulting website; the content architecture focusses on way, how such sites can be created and maintained efficiently.

In the AEM world the difference can be made visible very easily: You can see the information architecture on the website, while you can see the content architecture within CRXDE Lite. Omitting any details: The information architecture is the webpage, the content architecture the repository tree.
If you have some experience with AEM you know that the structure of the website typically matches some subtree below /content. But in the repository tree you don’t find a “header” node at the top of every subtree of a “jcr:content” node of a page, same with the footer. This piece of the resulting rendered website is taken from elsewhere, but not maintained as part of every page, although the information architecture mandates, that every page has a header and a footer.

Besides that the repository also holds a lot of other supporting “content”, which is important for a information architecture but not directly mandated by it. You have certain configuration which controls the rendering of a page; for example it might control which contact email address is displayed at the page footer. From an information architecture point of view it’s not important, where it is stored; but from a content architecture it is very important, because you might have the chance to control it at a single location, which then takes effect for all pages. Or at multiple locations, which result in changing it for individual pages. Or in a per-subtree configuration, where all pages below a certain page are affected. Depending on the requirement this will result in different content architectures.

Your information architecture will influence your content architecture (in some areas it even be a 1:1 relation), but the content architecture goes way beyond it, and deals with other “*bilities” like “manageability”, “evolvability” (how future proof is the content if there will be changes to information architecture?) or “customizability” (how flexible in terms of individualization per page/subsite is my content architecture?).

You can see, that it’s important to be aware of the content architecture, because it will have a huge influence on your application. Your application typically has a lot if built-in assumptions about the way content is structured. For example: “The child nodes below the content root node form the first-level navigation of the site”. Or “the homepage of the site uses a template called ‘homepage'” (which is btw also not covered by any information architecture, but an essential part of the content architecture).

In the JCR world there is the second rule of David’s model: “Drive the content hierarchy, don’t let it happen”. That’s the rule I quote most often, and even though it’s 10 years old, it’s still very true. Because it focusses on the aspect of managing the content tree (= content architecture), and that you should decide carefully considering the consequences of it.

And rest assured: It’s easier to change your application than to change the content tree! (At least if it’s designed properly. If it isn’t, … It’s even hard to change them both.)

AEM and docker – a question of state

The containerization of the IT world continues. What started with virtualization in the early 2000s has reached with Docker a state, where it’s again a hype topic.

Therefor it’s natural that people also started to play with AEM in docker (https://adapt.to/2016/en/schedule/running-aem-in-docker.html, https://www.linkedin.com/pulse/running-aem-docker-satendra-singh and many more).

Of course I was challenged with the requirement to run AEM in docker too. Customers and partners asking how to run AEM in docker. If I can provide dockerfiles etc.  I am hestitating to do it, because for me docker and AEM are not a really good fit (right now with AEM 6.3 in 2017).

Some background first: Docker containers should be stateless. Only if the application within the container does not hold any persistent state, you can shut it down (which means deleting all the files created by the application in the container itself), start it up, replace it by a different container holding a new version of the application etc. The whole idea is to make the persistent state somebody else’s problem (typically a database). Deployments should be as easy as starting new docker instances (from a pre-tested and validated docker images) and shutting down the old ones. Not working and testing in production anymore.

So, how does that collide with AEM? AEM is not only an application, but the application is closely tied with a repository, which holds state. Typically the application is stored within the repository, next to the “user data” (= content). This means, that you cannot just replace an AEM instance inside docker by a new instance without loosing this content (or resetting it to a state, which is shipped with the docker image). Loosing content is of course not acceptable.

So the typical docker rollout approach of new application versions (bringing new instances live based on a new docker image and shutting down the old ones) does not work with AEM; the content sitting in the repository is the problem.

People then came up with the idea, that the repository can stored outside of the docker image, so isn’t lost on restart/replacement of the image. Docker calls this “host directory as data volume” (https://docs.docker.com/engine/tutorials/dockervolumes/#locate-a-volume).

Storing the repo as data volume on the host filesystem

That idea sounds neat and of course it works. But then we have a different problem. When you start a new docker image and you mount this data volume containing the repository state, your AEM still runs the “old” version of your application. Starting the repository from a different docker image doesn’t bring any benefit then.

Docker image version 2 still starts application version 1.0

When you want to update your AEM application inside the repository, you would still need to perform an installation of your application into a running repository. Working in a production environment. And that’s not the idea why you want to use docker.
With docker we just wanted to start the new images and to stop the old ones.

Therefor I do not recommend to use docker with AEM; there is rarely a value for it, but it makes the setup more complicated without any real benefit.

The only exceptions I would accept are really short-lived instances, where hosting the repository inside the docker system isn’t a problem and purging the repo on shutdown is even a feature. Typically these are short-lived development instances (e.g. triggered by Continous integration pipeline, where you automatically create dedicated docker instances for feature branches). But that’s it.

And as a sidenote: This does not only affect TarMK-based AEM instances. If you have mongo-based instances, the application is also stored within the (Mongo-) repo. Just running AEM in a new docker image doesn’t update the application magically.

To repeat myself: This considers the current state. I know that the AEM engineering is perfectly aware of this fact, and I am sure that they try to adress it. Let’s wait for the future 🙂