How to use Runmodes correctly

Runmodes are an essential concept within AEM; they form the main and only way to assign roles to AEM instances; the primary usecase is to distinguish between the author and publish role, and another common usecase is also to split between PROD, Staging and Development environments. Technically it’s just a set of strings which are assigned to an instance, and which are used by the Sling framework at a few occassions, the most prominent being the Sling JCR Installer (which handles the /apps/myapp/config,/apps/myapp/config.author, etc. directories).

But I see other usecases; usecases where the runmodes are fetched and compared against hardcoded strings. A typical example for it:

boolean isAuthor() {
return slingSettingsService.getRunmodes().contains("author");
}

From a technical point of view this is fully correct, and works as expected. The problem arises when some code is based on the result of this method:

if (isAuthor()) {
// do something
}

Because now the execution of this code is hardcoded to the author environment; which can get problematic, if this code must not be executed on the DEV authoring instances (e.g. because it sends email notifications). It is not a problem to change this to:

if (isAuthor() && !isDevelopmentEnvironment()) {
// do something
}

But now it is hardcoded again 😦

The better way is to rely on the OSGI framework soley. Just make your OSGI components require some configuration and define the configuration for the runmodes required.

@Component(configurationPolicy=ConfigurationPolicy.REQUIRED)
public class myServiceImpl implements myService {
//...

This case requires NO CODING at all, instead you can just use the functionality provided by Sling. And this component does not even activate if the configuration is not present!

Long story short: Whenever you see a reference to SlingSettingsService.getRunmodes(), it’s very likely used wrongly. And we can generalize it to “If you add a reference to the SlingSettingsService, you are doing something wrong”.

There are only a very few cases where the information provided by this service is actually useful for non-framework purposes. But I bet you are not writing a framework 🙂

You think that I have written on that topic already? Yes, I did, actually 2 times already (here and here). But it seems that only repetition helps to get this message through. I still find this pattern in too many codebases.

A “no custom code challenge” for AEM?

My colleague Jan Exner initiated a “no custom code challenge” for the Analytics area earlier this year; and in the followup of this the people of 33sticks posted a good summary why it would be much better if you could avoid any custom code in the analytics world.

I am wondering if this holds true for AEM systems as well. On the one hand side customization is required. For example you need to style the components according to the requirements and styleguides. But on the other hand siede, excessive customization (overlays and adaptions/changes to ootb functionality) leads to maintenance and upgrade issues. But maybe we should not use the term “customization” anymore in the AEM world, but choose a more appropriate one, maybe “application development on AEM”, because that’s what we do in reality quite often.

And the application development part is the one which makes software expensive. It requires design, architecture, implementors, tests, automated tests, deployments. It requires management and comes with risk. The more application development we have, the higher the risk and the costs.

If you were able to avoid any application development in an AEM project, and just live with the core components components and brand them accordingly, that would be great. We would only focus on style and branding of the components, no need to Java developers and code deployments. Just pure frontend, and a clever use of the out-of-the-box tools AEM offers you.

I am truly convinced that you can build a standard marketing site (multi-site, multi-language, integrated translation etc) with this approach. It requires dicussion with the business and more important, you as a developer or architect need to urge yourself not write any code.

Of course, it’s probably getting a very basic site, but it can serve 2 purposes:

  • We identify what should really be part of AEM (which is something we can and should add asap)
  • We challenge ourselves to think in much simple structures and less customizations. I always wonder how easy the statement “then let’s overlay it” comes out of the mouth of an AEM consultant in a discussion, and I am no exception to this.

Yes, can we join Jan’s initiative. With AEM it’s definitely harder to achieve this than with other solutions of the Adobe Experience Cloud, but it’s doable. And honestly, we should accept such challenges more often. Even if we eventually fail.

But the learning is immense.

My advice to junior AEM developers

Recently I came across the AEM Developer Series posted by Anirudh Sharma at https://aem.redquark.org/2018/10/day-00-aem-developer-series.html; it’s a great resource and gives you as a developer a good introduction what you are likely to do when you want to start a career in AEM development.

Nevertheless, this tutorial focusses purely on development tasks. This is a good approach when you are a junior developer. If you are familiar with Java and assigned to a project together with more senior and experienced developers and a good architect, it’s a good start. Because in such a situation, you are told what to do. Others do the relevant decisions like

  • Is this a new component? Or a variation of an existing component?
  • Should we create a new template or not?
  • Can we reuse or re-purpose an out-of-the-box feature of AEM? Or shoukd we create that on our own?
  • How do structure the content?

And many questions more. And that is good, because you as a junior developer can learn a lot from others.

But it has one downside: You hardly known the product “AEM”, but are only interested in extension points and APIs you can use. A good example is Anirudh’s series I mentioned above: It just focusses on how to develop stuff, on APIs which exist for years. Yes, that’s natural for a development course 🙂

But you as a developer will never see what’s a already there!  You are likely to ignore all the new feature which have been added since AEM 6.0; sincet that time I see a shift in product development from providing a framework towards more ready-to-use features.

For example the “projects” feature: There is much APIs, but most of that stuff is actually creating the right JCR structures. I see it rarely used. For many developers (people which are in the ecosystem for 10 years alike as for people just started; and ) the major and sometimes only concepts they use are pages, components and assets. Regarding Content Fragments and Experience Fragments the situation seems a bit better, maybe these have been communicated and marketed better. But whenever a new requirement is raised, the immedate reaction of an experienced developers often looks like this: How can we make this happen with pages, assets, components and dialogs? Instead of asking yourself “Is there something in product which we can reuse or customize?” This question should come up much more often.  And yes, I am guilty as well.

Using new features, understanding their capabilities and their weaknesses should be as common to any more experienced AEM developer as knowing that you should close a ResourceResolver (sorry, couldn’t help myself :-))

So my recommendation to all of you who think of AEM 6.5 just as much more stable and performant AEM 5.6.1 with deprecated ClassicUI: It is, but also much more. Walk through the release notes and documentation, check for the new features, work with the tutorials and watch the videos. There are a lot of hidden gems which are good to know, and in the right situation it can be the solution to your development problem. Or at least help you to reduce effort.
Just relying on the JCR API, Sling Resources, Servlets and the Edit mode might be absolutely future proof, but why do you use AEM then?

So for any AEM junior developer: Next to your technical enablement: Try to understand what’s in the product. Work with authors, test the user interface, check the documentation; and maybe attend the user training. And be curious!

Understanding the “Oak Repository Statistics” MBean

In the last releases AEM has been greatly enhanced to provide information which are suitable for health detection. Especially Oak provides a huge amout of MBeans which can be monitored. But sometimes they are a bit hard to understand. Based on some ongoing activities I digged through the “Oak Repository statistics” MBean and found it quite useful, even for some basic understanding and analysis.

I did this analysis and the screenshot on AEM 6.5, but this MBean is present at least since AEM 6.1 (probably even 6.0) and its content hasn’t changed much.

When you access this MBean the top of the page looks like this (this instance has just started):

Oak Repository Statistics MBean

There are a number of values collected, and presented for a number of times:

  • per second: The raw value in each second for the last minute.
  • per minute: The aggregated value on a minute basis for the last hour
  • per hour: The aggregated value on a hourly basis for the last day
  • per day: the aggregated value on a daily basis

The aggregation differs based on the type of the metric:

  • Gauge: This is a simple value, which is not further processed. When values of types must be aggregated, an average is calculated.
  • Counter: This a number, which can be accumulated. When values of type counter must be aggregated, they are summed up.
Attribute NameTypeDescription
SessionCountGaugeThe number of JCR sessions, which are currently open
SessionLoginGaugeThe number of sessions opened within that time
SessionReadCountCounterThe number of read operations in the JCR (over all sessions)
SessionReadDurationGaugeThe total time spent in read operations (nanoseconds)
SessionReadAverageGaugethe average duration for read operations (SessionReadDuration divided by the number of reads)
SessionWriteCountCounterThe number of write operations in the JCR (over all sessions). Be aware that session.refresh() is also counted as write operation!
SessionWriteDurationGaugeTotal time spent writing to sessions in nanoseconds
SessionWriteAverageGaugethe average duration for write operations (SessionWriteDuration divided by the number of writes)
QueryCountCounterThe number of JCR queries executed
QueryDurationGaugeThe total time spent in JCR query operations (miliseconds)
QueryAverageGaugethe average duration for queries (QueryDuration divided by the number of writes)
ObservationEventCountCounterThe number of observation events delivered to all listeners
ObservationEventDurationGaugeThe total time spent processing events by all observation listeners in nanoseconds
ObservationEventAverageGaugethe average duration spent processing observation events (ObservationEventDuration divided by the number of events)
ObservationQueueMaxLengthGaugeThe maximum length of the JCR Observation Queue; in newer Oak versions this queue does no longer exist, and then the value is -1

This measurement is done to limit the amount of data which needs to be stored. And this data is stored within the JVM inside the Oak bundles; that means that any restart of the JVM or restart of the Oak bundles will reset these values. If you want to persist these values you need to read them via JMX and store them.

Ok, what can you do with all this data? Well, it can help you to answer many questions. For example you can find out very easy, if you have a session leak.Because then numbers ion the SessionCount attribute always increase over time. It’s also interesting to find out what is happening within your system when it’s completely idle. Are there repository writes which you are not expecting? Queries every few seconds?

If you are investigating performance issues, or if you want to avoid them, you should have a look into this MBean.

Update: Fixed the only link in this post. Thanks Oswald for reporting!

Adobe Summit 2019 — my session summary

This year I was not able to attend the Adobe Summit in Las Vegas, but as many recordings are online available, I want to give you a short list of sessions I would have liked to attend.

Adobe Experience Manager Rock Star: The top tips are here” by Kaushal Mall and Darin Kuntze. Of course the most important session for the AEM community 🙂 Thanks to all presenters and congratulations to Marc!

Content strategy and architecture — The infrastructure behind the interface” by Elise Hahn. Elise highlights the importance of content architecture as part of the content strategy of an organization. Not a technical topic in the first place, but she points out that the content strategy always impacts the work of the engineering teams (even if you are not aware of it). Highly recommend content, also for technical folks.

Adobe Experience Manager Sites: Top Innovations” by Haresh Kumar & Cedric Hüsler. Cedric presents the top features of the new AEM 6.5 version. As usual a very informative and funny session, but not the usual “top 10 new features”. This year less seems to be more 🙂

Meet Dexter: The World-Clas Experience Manager Implementation of Adobe.com” by Audumber Ramesh, Lukas Ryf & Chris Millar. Chris shows how they removed the engineering group from the “we need a new component” discussions and let the design agencies directly talk to authors. As followup Chris also posted how they built their dialogs.

And of course there are lot of more gems in these sessions, but I did have not enough time to watch them all (yet). You can find all sessions at https://summit.adobe.com/na/summit-online/ (use the search function at the bottom of the page).

“We have an urgent performance issue” (part 2)

As a reaction of the last post I got the question by Oswaldo about specific recommendations on performance. Actually, there are a lot. But that’s material for another blog post 🙂 or skip to the bottom of this post.

Instead I want to give you a recommendation on how to handle situations when you did not have time nor capacity you can spend on thinking about performance and response times. But as an experienced technical leader you know that at some point this question will arise for sure. You might get a few hours to spend on that question, but how do you spend it most efficiently?

Clearly not for performance optimization! Because it’s not enough time to analyze and improve substantial parts of the application. And tomorrows changes might render these improvements useless…

Instead I would recommend you to spend this time in communication and building rapport with people who can help you in case such a performance problem arises. Get in contact with the operations people which are operating your system and application. Understand how they work and what tools they use. Understand how they can help you in case of performance issues, what information they can provide to you. Ask for an account on their monitoring system, just to demonstrate interest in their work and problems. And potentially give them some tips what they can additionally do to improve the quality of the information (for example asking if they can also provide the raw data and not only the visualization based on aggregated data). Or show them some hints how they improve their work with your application.

The biggest value in that activity is the fact, that in case the dreaded performance issue is noticed on an exec level, you already know who to talk to. You know a bit how the others are working and how you can help them. As a tech lead it’s then much easier to ask for logfiles, traffic patterns, CPU usage graphs and I/O latencies, threaddumps etc. You know upfront what information it operations already collects by default. You might have direct access to a monitoring system to get more information. You can even get a warning from the ops people in advance that some real big escalation is imminent. For me this is the best you can get if you have just a few hours to spend.

You might ask why is that important. Because it reduces the TTAD (time to actionable data) dramatically in case of such performance issues. You know who to get on the phone and into calls to start investigation. You already know what information is already available or you can even access it directly. You can report “We are analyzing data and can come up with first suggestions within the day” instead of “we are talking to IT and see how they can support us to get data”.

That’s much more important than spending some hours on random performance tuning. And in case you ever run into performance issues, these hours are one of the best investment you made in the whole project.

(And as random recommendation to improve AEM request rendering times: Disable the MobileRedirectFilter (PID: com.day.cq.wcm.mobile.core.impl.redirect.RedirectFilter) by setting the configuration parameter “redirect.enabled” to “false”. In the age of responsive websites it’s purpose is no longer given. And under load its performance impact can be significant.)

“We have an urgent performance problem!”

“Customer situation is heating up because they have urgent performance problems in their production environment” … That’s something I heard quite often in my consulting career. And in many cases it was an outcry for help, because this problem did not turn out during tests, but most times in production environments.

I read once a nice quote: “Everyone has a performance test environment. But only a few have a production environment!” So true.

Is that really inevitable that performance issues occur? Given the number of cases I’ve seen, I am inclined to believe it. But it is not.

I think, that if all of a sudden a performance problem is put on priority 1, it has a history. If you are in a project team and one morning your project lead/PO tells you that the priorities have shifted, and that you need to push that little unknown item “Performance tuning” from the bottom of the backlog to the top of the list, you were aware of performance as topic. But other features were considered more important.

Or if your customer is starting to escalate with your account team that a really bad application (the one you developed for them) performance is affecting their business, then you often know, that this is not a new issue.

In both cases the priority of this problem just hit a certain level, that executives are getting concerned and start to escalate this topic, because it is hurting their and/or the company’s goals. Here you go, yesterday everything was fine, today it’s all screwed.

All of these problems have a history; performance problems rarely just rise out of nowhere, but they evolve under the radar of ignorance. As long as noone complains, who cares about performance? Features are more important. Until the complaints get loud enough they cannot be overheard anymore.

But when you get at the point where you need to care about performance, you are often in a very bad position. Because now you need information, tools and processes to improve that situation very fast. Because are in the focus everyone is looking to your problem (it’s yours!). “Deliver a fast improvement! Within this week!

But if you have never prepared for that situation, you lack all the necessary things for such an operation:

  • You don’t have KPIs to know what is “acceptable” performance. You just have everyone’s feeling “it’s slow!”
  • You don’t have the tools to measure the current performance. You just have logfiles (hopefully you have them …)
  • Is your system actually able to deliver that performance?
  • “Has anyone got the reports from the latest performance tests?”

That’s a hard situation and there is barely any other way than just to take a bunch of good people and start a surgery in production: Analyzing lot of data, making guesses, increasing logging, deploying hotfixes, and do all the things you already accumulated list on your list of “things which might improve performance” which you maintained over the last months.

But let’s be honest: This is chaos, unplanned, affecting a lot of other teams and people, and hurting careers. But does it really have to be that way?

No. Application performance is no magic, but must be managed. Performance should always be an important aspect in your project, and resources should be spent on it as you spend resources on testing. Otherwise performance is ignored 90% of the time, and in the remaining 10% you are escalating because of the lack of it.

Roles & Rights and complexity

An important aspect in every project is the roles and rights setup. Typically this runs side-by-side with all other discussions regarding content architecture and content reuse.

While in many projects the roles and rights setup can be implemented quite straight forward, there are cases where it gets complicated. In my experience this most often happens with companies which have a very strong separation of concerns, where a lot of departments with varying levels of access are supposed to have access to AEM. Where users should be able to modify pages, but not subpages; where they are supposed to change text, but not the images on the pages. Translators should be able to change the text, but not the structure. And many things more.

I am quite sure that you can implement everything with the ACL structure of AEM (well, nearly everything), but in complex cases it often comes with a price.

Performance

ACL evaluation can be costly in terms of performance, if a lot of ACLs needs to be checked; and especially if you have globally active wildcard ACLs. As every repository acecss runs through it, it can affect performance.

There is not hard limit in number of allowed ACLs, but whenever you build a complex ACL setup, you should check and validate its impact to the performance.

  • Wildcard ACLs can be time consuming, thus make them as specific as possible.
  • The ACL inheritance is likely to affect deep trees with lots of ACL nodes on higher-level nodes.

Maintainability

But the bigger issue is always the maintenance of these permissions. If the setup is not well documented, debugging a case of misguided permissions can be a daunting task, even if the root cause is as simple as someone being the member of the wrong group. Just imagine how hard it is for someone not familiar with the details of AEM permissions if the she needs to debug the situation. Especially if the original creator of this setup is no longer available to answer questions.

Some experiences I made of the last years:

  • Hiding the complexity of the site and system is good, but limiting the (read) view of a user only to the 10 pages she is supposed to manage is not required, it makes the setup overly complex without providing real value.
  • Limiting write access: I agree that not everyone should be able to modify every page. But limiting write access only to small parts of a page is often too much, because then the number of ACLs are going to explode.
  • Trust your users! But implement an approval process, which every activation needs to go through. And use the versions to restore an older version if something went wrong. Instead of locking down each and every individual piece (and then you still need the approval process …)
  • Educate and train your users! That’s one of the best investments you can make if you give your users all the training and background to make the best of the platform you provide to them. Then you can also avoid to lock down the environment for untrained users which are supposed to use the system.

Thus my advice to everyone who wants (or needs) to implement a complex permission setup: Is this complexity really required? Because this complexity is rarely hidden, but in the end something the project team will always hand-over it to the business.

Ways to achieve content reuse in AEM

Whenever an AEM project starts, you have a few important decisions to make. I already wrote about content architeture (here and here) and its importance to a succesful project and an efficient content development and maintenance process. A part of this content architecture discussion is the aspect of content reuse.

Content reuse happens on every AEM project, often it plays a central role. And because requirements are so different, there are many ways to achieve content reuse. In this blog post I want to outline some prominent ways you can use to reuse content in AEM. Each one comes with some unique properties, thus pay attention to them.

I identified 2 main concepts of content reuse: Reuse by copy and Reuse by reference.

Reuse by copy

The AEM multisite manager (MSM) is probably the most prominent approach for content reuse in AEM. It’s been part of the product for a long time and therefor a lot of people know it (even when you just have started with AEM, you might came across its idioms). It’s an approach which creates independent copies of the source, and helps you to keep these copies (“livecopies”) in sync with the original version (“blueprint”) . On the other hand side you still can work with the copies as you like, that means modify them, create and delete parts of the pages or even complete pages. With the help of the MSM you can always get back to the original state, or a change on the blueprint can be propagated to all livecopies (including conflict handling). So you can could call this approach “managed copy”.

The MSM is a powerful tool, but comes with its own set of complexity and error cases; you and your users should understand how it works and what situations can arise out of it. It also has performance implications, as copies are created; also rolling out changes on the blueprint to livecopies can be complex and consume quite some server resources. If you don’t need have the requirement to modify the copies, the MSM is the wrong approach for you!

Unlike the MSM the language copy approach just creates simple copies; and when these copies have been created there is no relationship anymore between the source and the target of the language copy. It’s an “un-managed copy”. Personally I don’t see much use in it in a standalone way (if used as part of a translation workflow, the situation is different).

Reuse by reference

Reuse by reference is a different approach. It does not duplicate content, but just adds references and then the reference target is injected or displayed. Thus a reference will always display the same content as the reference target, deviations and modifications are not possible. Referencing larger amount of content (beyond the scope of parts of a single page) can be problematic and hard to manage, especially if these references are not marked explicitly as such.

The main benefit of reuse by reference is that any change to the reference target is picked up immediately and reflected in the references; and that the performance impact is negligible. Also the consistency of the display of the reference with the reference target is guaranteed (when caching effects are ignored).

This approach is often used for page elements, which have to be consistent all over a site, for example for page headers of footers. But also the DAM is used in this way, even if you don’t embed the asset itself into the page, but rather just add a reference to it into the page).

If you implement reuse by reference, you always have to think about dispatcher cache invalidation, as in many cases a change to a reference target it not propagated to all references, thus the dispatcher will not know about it. You often have to take care of that by yourself.

Having said that, what are the approaches in AEM to implement reuse by reference?


Do it on your own: In standard page rendering scripts you already do includes, typically of child nodes inside the page itself. But you can also include nodes from different parts of the repository, no problem. That’s probably the simplest approach and widely used.

Another approach are Content Fragments and Experience Fragments. They are more sophisticated approaches, and also come with proper support in the authoring interface, plus components to embed them. That makes it much easier to use and start with, and it also offers some nice features on top like variants. But from a conceptual point if view it’s still a reference.

A special form of reuse by reference is “reuse by inheritance“. Typically it is implement by components like the iparsys or (when you code your own components) by using the InheritanceValueMap of Sling. In this case the reference target is always the parent (page/node). This approach is helpful when you want to inherit content down the tree (e.g from the homepage of the site to all individual pages); with the iparsys it’s hte content of a parsys, with the InheritanceValueMap it’s properties.

What approach should I choose?

The big differentiator of the reuse by copy and reuse by reference is the question if reused content should be adapted or changed at the location where it should be reused. As soon as you need to have the requirement “I would like to change the content provided to me”, the you need to have reuse by copy. And in AEM this normally mans “MSM”. Because content is not created once, but needs to be maintained and updated. At scale MSM is the best way to do it. But if you don’t have that requirement, use reuse by reference.
You might even use both approaches, “reuse by copy” to manage the reuse of content over different sites, and “reuse by reference” for content within a site.

Your friendly business consultant is a can help you find out which reuse strategy makes sense for your requirements.

Why a good business consultant can save you more money than a good AEM architect

In the years working in the consulting business for Adobe I joined many projects. Some only temporary to support on some special issues, others in a broader context; but always it was with a focus on the technology, on implementation, architecture and infrastructure. And with this background, I always tried to help my customers and projects and use the AEM (or CQ5) technology in the right way; to spend their money wisely and cost-effective.

But: according to my own standards I often failed to do so.

Not because I did not know the product. Not because the customer was not listening to me. And not because I was not able to give advice. But because I was not sitting at the right table when some important decisions were made. Because I was sitting with infrastructure teams to build a hardware sizing. I was working with customers architects to understand the constraints under which we have to design and operate AEM. But I did not have enough time to work with the business, understand their needs and the ways they thought AEM can help them. I was not there when they decided on content architecture, components and templates, business processes and how the editors should work. Because my focus is technology, so I talk with the technology people in the first place.

And why is that important for all of us? Because if you don’t know AEM and you don’t have any guidance what it can do and how the existing features can be used, it is very likely that you start building everything on your own. In that case you never use the advanced features of AEM, but you start based on that what you know or what was demo-ed to you by presales people. If you don’t have a person, a trusted advisor, someone knowing AEM very well (and also all the features added to it in the last years!), you start to invest a lot of money to build things which already exist. Too much money.

Let me introduce the AEM business consultant. Someone very familiar with philosophy of AEM. Someone which has an up-to-date knowledge of AEM. Someone which is not a developer, but can map your requirements into AEM features and knows what is possible out-of-the-box and where an existing feature needs an extension on top. And also someone which helps to leverage other features of AEM you paid for.

A business consultant will attend meetings with business stakeholders and with the users which are going to use AEM. A business consultant will help you telling the technologists what they should implement (and what not). And a business consultant is a good partner to the architect and together they can identify the gaps which have to covered by custom development and they frame requirements in a way that they can be covered by out-of-the-box features.

For me often working in an architect role it makes me job much easier if I can focus on the technical aspects. If I know that I have a partner in crime at my side, which makes sure that business users understand the philosophy of AEM; if the business people know what they can get easily; and if they understand what requirement is harder to implement (although it sounds soo easy). Because once an idea has developed in the mind of the business stakeholders, it’s much harder to change it (if that’s necessary). If you as an architect see the requirement before its implementation at all.

Because the worst case scenario is that you are implementing some project features and spend a lot of money on it. And in hindsight it turns out that it could have implemented it with 20% of the time if someone would have told the business and developers, that they just re-invented content fragments (while using AEM 6.3 which has that feature built-in). But because noone told them that, you have designed and built a lot of your application on this custom implementation; and replacing it is expensive. But with a business consultant in the project she would have identified that within the first sessions and helped you to use this out-of-the-box feature.

So a business consultant helps you to spend your money on the right things; and an architect ensures that it is built the right way. Two sides of the same medal.