Storage sizing and SAN

A part of every project I’ve done in the last years was always the task to create a hardware sizing; many times it was part of the project setup and was a very important piece which got fed into the hardware provisioning process.

(Hardware sizing is often required during presales stages, where it is mostly used to compare different vendors in terms of investment into hardware. In such situations the process to create a sizing is very similar, but the results are often communicated quite differently …)

Depending on the organization this hardware provisioning process can be quite lengthy, and after the parameters have been set they are very hard to change. That is especially true with large organisations which want to use an on-premise deployment in their own datacenter; because then it means starting a purchase process followed by provisioning and installation, and so on. Especially in the FSI area it is not uncommon to have 6 months from the signed order of the budget manager to the handover of an installed server. And then you get exactly what you asked for. So everyone tries to make sure, that the input data is as good as possible, and that all variables have been considered. Reminds me a lot of the waterfall model in the software development business, and it comes with the same problems.

Even if your initial sizing was 100% accurate (which it never is), 6 months are a long time where also some important project parameters can change. So there is a not-so-small chance, that at the time the servers are handed over to you, you know that the hardware is not sufficient anymore, based on the information you have right now.
But changes are hardly possible, because for cost efficiency you ordered not the hardware which offered the most flexibility in terms of future growth, but a model with some constraints in terms of extendability. And now, 6 months after project start and just 4 months ahead of the golive date, you cannot restart the purchasing process anymore!

Or a bit worse: You are already live for 6 months and now you start to run short of the disk space, because your growth is much higher than anticipated. But the drive bays of your servers are already full and you have already implemented the largest disks available.

For the topic of disk space the solution can be quite easy: Don’t use local disks! Even if local SSDs are very performant in terms of IOPS, try to avoid the discussion and go for a SAN (Storage area network), which should be available already in an enterprise datacenter. (Of course you can also choose any different technology, which decouples the storage from the server in a strong way and performs well.) For AEM and TarMK a good SAN is sufficient to deliver decent performance (even if a local SSD improves this again).

I know that this statement can be problematic, as there are cases where IOPS are more important than the flexibility to easily add storage. Then your only chance is to take the SSDs and make sure, that you still have the chance to add more of them.

The benefit of a SAN is that you split the servers from their storage, and you can upgrade or extend them independently from each other. Adding more hard drives to a SAN is always possible, so you have hardly a limit in terms of available disk space per server. Attaching more disk space to a server is then a matter of minutes and can be done incrementally. This allows you also to attach disk space on demand instead of attaching the full amount of disk space on provisioning time (and consuming the full amount 2 years later).
And if you have servers and storage split up, it is also much easier to replace a server by another model with more capacity (RAM or CPU-wise), because you don’t need to move all data but rather just connect the storage to a different server.

So using a SAN does not free you up from delivering a good sizing, but it can soften the impacts of an insufficient sizing (mostly based on insufficient data), which is often the case on project kickoffs.

Managing repository growth

On almost every project there is this time, when (all of a sudden) the disk space assigned to AEM instances becomes full; in most (!) cases such a situation is detected early on so there is time to react, which often means just adding more disk capacity.

But after that the questions arise: Why is our repository so large? Why does it consume more than the initially estimated disk space? What content exactly is causing this problem? What went wrong so we actually got into that situation?

From my point of view there are 3 different views on this situation, which can be an answer:

  • Disk space is not managed well, that means that unnecessary stuff is consuming lot of space outside of the repository.
  • The maintenance jobs are not executed.
  • The estimation for content and content growth was not realistic.

And in most cases you need to check all these 3 views to answer the question “Why are we running out of space?”

Disk space is not managed well
This is an operations problems in the first place, because non-repository data is basically consuming disk space which has been planned for the repository. Often seen:

  • Heapdumps and log files are not rotated or removed.
  • Manually created backups files have just been copied to a folder next to the original one once, but in meanwhile they are not useful any more, because they are totally out of date.
  • The regular backup process is creating temporary files, which are not cleaned up; or the backup process itself consumes temporary disk space, which let sthe disk consumption spike.

So this can just be handled by careful working and in-time purging of old data.

The maintenance jobs are not executed
Maintenance jobs are an essential part of the ongoing job to remove the unnecessary fat from you AEM instance, be it on the content level or on a repo level. It includes

  • workflow purge
  • audit log purge
  • repository compaction (if you use TarMK)
  • datastore GC (if you use a datastore)

You should always keep an eye on these; the Maintenance Dashboard is a great help here. But do rely on it blindly

Your estimation for content and content growth was nor realistic
That’s a common problem; you have to give an initial hardware sizing, which also includes the amount of disk space used by the repository. You do your best to include all relevant parameters, you add some buffer on top. But that is an estimation on the beginning of the project, when you don’t know all the requirements and their impact on disk consumption in detail. But that’s what you said, and changing them afterwards is always problematic.

Or AEM is used differently than initially anticipated and all the assumptions you have based your initial hardware sizing are not longer true. Or you just forgot to add the versioning of the assets to your calculation. Or…
There are a lot of cases where in retrospective the initial sizing of the required disk space was just incorrect. In that case you have only chance: Redo the calculation right now! Take your new insights and create a new hardware sizing. And then implement it and add more hardware.

And the only way I see to avoid such situations is: Do not make estimations for the next 3 years! Be a agile and review your current numbers every 3 months; during this review you can also determine the needs for the next months and plan accordingly. Of course this assumes that you are flexible in terms of disk sizing, so for any non-trivial setup the use of SAN as storage technology (plus a volume manager) is my preferred choice!

Of course this does not free yourself from working on the cleanup of the systems and running all required maintenance jobs; but it will make the review of the used disk space a regular task; so you should see deviations from your anticipated plan much earlier.

What is writing to my Oak repository?

If you were ever curious what’s happening on the Oak repository in AEM (and I can tell: a lot!), there’s a chance to logs all the repo write actions.

Just set create a logger for „org.apache.jackrabbit.oak.jcr.operations.write“ on loglevel TRACE and there you go.


21.05.2016 23:38:34.353 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] Setting property [/var/audit/com.day.cq.wcm.core.page/content/geometrixx/en/services/59a34cc6-ee23-4423-9e76-03868cd5e7e6/cq:userid]
21.05.2016 23:38:34.353 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] Setting property [/var/audit/com.day.cq.wcm.core.page/content/geometrixx/en/services/59a34cc6-ee23-4423-9e76-03868cd5e7e6/cq:path]
21.05.2016 23:38:34.353 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] Setting property [/var/audit/com.day.cq.wcm.core.page/content/geometrixx/en/services/59a34cc6-ee23-4423-9e76-03868cd5e7e6/cq:type]
21.05.2016 23:38:34.353 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] Setting property [/var/audit/com.day.cq.wcm.core.page/content/geometrixx/en/services/59a34cc6-ee23-4423-9e76-03868cd5e7e6/cq:category]
21.05.2016 23:38:34.353 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] Setting property [/var/audit/com.day.cq.wcm.core.page/content/geometrixx/en/services/59a34cc6-ee23-4423-9e76-03868cd5e7e6/userId]
21.05.2016 23:38:34.354 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] Setting property [/var/audit/com.day.cq.wcm.core.page/content/geometrixx/en/services/59a34cc6-ee23-4423-9e76-03868cd5e7e6/path]
21.05.2016 23:38:34.354 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] Setting property [/var/audit/com.day.cq.wcm.core.page/content/geometrixx/en/services/59a34cc6-ee23-4423-9e76-03868cd5e7e6/type]
21.05.2016 23:38:34.354 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] Setting property [/var/audit/com.day.cq.wcm.core.page/content/geometrixx/en/services/59a34cc6-ee23-4423-9e76-03868cd5e7e6/cq:properties]
21.05.2016 23:38:34.356 *TRACE* [Thread-8145] org.apache.jackrabbit.oak.jcr.operations.writes [session-43731] save

This thread with the name [Thread-8145] writes to the /var/audit path, so it’s quite likely related to the Audit logger. And this is using the session [session-43731]; the session name is random per session, but it is a very useful information:

  • you can identify a single session and what’s happening within this session. It is especially useful to determine what this specific session is writing and how often there are saves.
  • If you have the session name, and it is a long-running session, you can look up this session in the JMX MBean console; in Oak 1.0 and Oak 1.2 a stack trace is stored when the session is opened; in Oak 1.4 this has been removed for performance reasons, but you can get it back when you set the System property ‚oak.sessionStats.initStackTraceThreshold‘ to ‚0‘ (zero).

So if you need to understand what’s happening on your repository and what might causing repository write activity, this is an easy way to go. The only drawback: Such logging eats up a lot of diskspace, especially if run it for an extended period of time.

It’s also possible to do this for reads, but at least on Oak 1.0 it doesn’t log the paths, but only the operation; so it’s less useful here. And it produces a lot of data: Having it enabled for 2 minutes and 1 page load on my local instance it produced 10 megabytes of logs.

Disabling services and components in AEM

Sometimes you need to disable a service or a component; a simple example for this a servlet, which is used on authoring instance, but which must not be active on publish. There are several ways to achieve this. (In this blog post whenever I mention “service”, you can implicitly assume that it also works for SCR components; technically even “component” would be right wording, but in the AEM world “component” is heavily used word with a number of different meanings.)

A very simple and smart solution for your own codebase is the use of the SCR configuration policy; when this is used on a OSGI service the SCR runtime won’t start the service if no dedicated OSGI configuration exists (even the activate() method isn’t called). And because you can create OSGI configs based on run mode it’s the perfect way to enable or disable services.

Nice examples for this can be found in ACS AEM Commons:

This the the recommended way to write services; the decision to run or not run it is done on deployment/configuration time and not during development. And it’s the most complete way, because with proper configuration the service will never get active at all. The only (very) small drawback is that your project-managed OSIG configurations will increase.

A different way is to use a special property „enabled“, which is then checked in the service before doing something useful. But when you use the enabled-property, the service is properly started and registered to the OSGI runtime; thus it might get registered as servlet and into other service factories. You never know what is happening or what not, so you it’s always best to have the code ready.
This approach gives you also the choice on deployment time to enable or not to enable the service. But it has the drawback, that the service is active and code of it might run before checking the „enabled“ status. So from my understanding there is never really a usecase for this “enabled” property. An if it has a different function than turning the service on or off, it shouldn’t be named “enable”.

If you need to disable services, which are not under your control, and which neither offer a „enabled“ property nor the configurationPolicy approach, the only remaining choice is the ComponentDisabler of ACS AEM Commons. That’s basically a hack and should be your last resort, because it cannot prevent the startup of the service, but in fact shuts it down after the service has been started (and might have already been working). But if you can live with this constraints, it’s the way to go.

If you are a developer, I strongly recommend to learn and use the SCR ConfigPolicy setting!

Integrating AEM in a portal?

Content which is being produced and stored inside AEM is often widely used. Not only for direct publishing to the web, but nowadays also in Emails (using Adobe Campaign) or for the consumption of mobile apps. Also a very common case is the integration of AEM content in 3rd party systems, when content maintained in AEM is fetched by a portal. So the portal provides the transactional parts, but the content is fetched from AEM. This is the usecase I want to discuss in this post. When I write „portal“ it’s most often a J2EE portal, but there also other options. For this post the underlying technology stack doesn’t really matter.

In this case the portal is always the leading application, and AEM has just a supporting function (providing content). Depending on the exact case, you can embed AEM content as a portlet or use the REST approach to fetch AEM content. In this case AEM is mainly used for content authoring.

This approach has the unique benefit, that you can continue to use the existing portal solution and and provide your authors a easy-to-use solution for content authoring. The existing architecture is just extended by adding AEM.

But from my point of view this approach also has some severe problems:

  • In the editmode AEM can only display the elements, which are available to AEM. If the portal displays AEM content as part of the page next to a number of other elements, you don’t have these elements available on AEM. This limits the usability of the editmode or preview mode to display content in an end-user way.
  • A similar problem is, that with AEM pages you can normally edit most pieces of a page. In the case of the portal, you have to limit the possibilities of an author to a way provided by the portal. That means, if the portal does not allow you to change the page footer or add special HTML headers, you cannot change this from AEM (although it might be possible ootb). Within an AEM application I always recommend to allows authors to change every text and image of a page (including headers and footers), avoiding any hardcoded content.
  • When content is created in AEM, you need to develop templates and components for it. If you want to display this content in the portal, you need to build its equivalent there as well, but with a different technology. This is doubling the cost and you build dependencies in the cycles of the AEM development and the portal application development. You need to spend development on both sides to include new components or change the parts of a page, where content should be managed from authors.
  • Recent versions of AEM provide a good integration of the Adobe Marketing Cloud features, so authors can easily use them. When a portal is setup in a way, that it can fetch content from AEM, this integration normally needs additional effort and implementation work, which you don’t have when AEM is in the front.

A personal conclusion: I think, that using AEM just a content-authoring system is possible, but you ignore many of the features of it, which bring a lot of value. You increase costs by the need to develop new components and templates twice (for AEM and the portal) and decrease the time-to-market by synchronizing 2 development streams, which should basically be independent. And you cannot use many of the new Adobe Marketing Cloud integrations provided out-of-the-box.

So there are quite some arguments (especially for enduser-facing systems) not to use AEM just as a simple content-feed, but to establish AEM as frontend of your platform.

TarMK and SAN

Yesterday’s posting „TarMK and NAS“ got quite some attention, and today I got several times the question „And what about SAN?“. Well, here the answer to „Do you recommend TarMK on SAN?”

A little background first: A SAN (Storage area network) is a service, which offers block devices. It is part of today’s enterprise datacenter’s infrastructure, where attaching local disks to servers is not feasible and does not scale. If you are familiar with PCs, you can consider a block device like a partition on your hard disk. You cannot use a partition by itself, but you have to format it and put a filesystem on it.
That’s basically the same with a SAN: You get it as a raw device (called volume), you put a filesystem on it (for example ext4 when you use Linux) and then you can use it just like any other local drive. The only difference to a local drive is, that the connectivity is not provided by local SATA-port, but over a network (you’ll find the terms iSCSI or Fiber Channel, but that’s too much detail here).

And that’s the huge difference: With a SAN you get a block device, with a NAS you’ll get a shared filesystem.

So the basic principle is, that you can treat SAN like any local storage. And if you format it with a filesystem like ext4, btrfs or NTFS on Windows, you have a local filesystem. And like I said in yesterday’s post: When you have a local filesystem, where only a single system is controlling access to it, you can use mmap. And mmap is all we care about here!

My recommendations for TarMK are:

  • When you have the choice between SAN and NAS (sometimes you have): drop the NAS and go for SAN.
  • And when you have the choice between SAN and local drives, choose „SAN“ as well. Why? Because you never need to deal with the problems of „my hard drives are full and we don’t have any empty drive bays anymore on this server!“ anymore. Just allocate some more space to your SAN volume, resize the filesystem and that’s it. When you have mmap available for your TarMK, filesystem performance shouldn’t be something to worry about.

TarMK on NAS?

Today the question was raised, if TarMK running on NAS is a good idea. The short answer is: „No, it’s not a good idea“.

The long answer: The TarMK relies on the ability of the operating system to map files into memory (using the so-called memory-mapped technology, short: mmap; see this wikipedia page on it). Oak does this for the heavily used parts of the TarMK to increase performance, because then these parts don’t need to be read from filesystem again, but are rather always available in memory (which is by at least an order of magnitude faster). This works well with a local filesystem, where the operating system knows about every change happening on the filesystem, because it is the only one through which access to this filesystem happens, and it can make sure, that the content of the file on disk and in memory are in sync. I should also mention, that this memory isn’t part of the heap of the JVM, but rather the free RAM of the system is used for this purpose.

With a NAS the situation is different. A NAS is designed to be accessed by multiple systems in parallel without the need to synchronize between each other. The 2 most common filesystems for this are NFS and SMB/CIFS. On NFS one system can open a file and is not aware that a second system modifies in the same time. This is a design decision which prevents that a system can keep the content of a file on NFS and in-memory in sync. Thus mmap is not usable when you use a NAS to store your TarMK files.

And because mmap is not usable, you’ll get a huge performance impact compared to a local filesystem where mmap can be used. And then I haven’t even mentioned the limited bandwidth and higher latency of a remote storage compared to local storage.

If you migrate from CRX 2.x (till AEM 5.6.1) this problem was not as visible as it is now with Oak, because there was the BundleCache, which cached data already read from disk; this bundle cache is an in-memory, in-heap structure and you had to adjust the heap size for it. CRX 2.x did not use mmap.

But Oak does not have this in-memory cache any more, but relies on the mmap() feature of the operating system to keep the often-accessed parts of the filesystem (the TarMK) in memory. And that’s the reason why should leverage mmap as much as possible and therefor avoid a NAS for TarMK.