A part of every project I’ve done in the last years was always the task to create a hardware sizing; many times it was part of the project setup and was a very important piece which got fed into the hardware provisioning process.
(Hardware sizing is often required during presales stages, where it is mostly used to compare different vendors in terms of investment into hardware. In such situations the process to create a sizing is very similar, but the results are often communicated quite differently …)
Depending on the organization this hardware provisioning process can be quite lengthy, and after the parameters have been set they are very hard to change. That is especially true with large organisations which want to use an on-premise deployment in their own datacenter; because then it means starting a purchase process followed by provisioning and installation, and so on. Especially in the FSI area it is not uncommon to have 6 months from the signed order of the budget manager to the handover of an installed server. And then you get exactly what you asked for. So everyone tries to make sure, that the input data is as good as possible, and that all variables have been considered. Reminds me a lot of the waterfall model in the software development business, and it comes with the same problems.
Even if your initial sizing was 100% accurate (which it never is), 6 months are a long time where also some important project parameters can change. So there is a not-so-small chance, that at the time the servers are handed over to you, you know that the hardware is not sufficient anymore, based on the information you have right now.
But changes are hardly possible, because for cost efficiency you ordered not the hardware which offered the most flexibility in terms of future growth, but a model with some constraints in terms of extendability. And now, 6 months after project start and just 4 months ahead of the golive date, you cannot restart the purchasing process anymore!
Or a bit worse: You are already live for 6 months and now you start to run short of the disk space, because your growth is much higher than anticipated. But the drive bays of your servers are already full and you have already implemented the largest disks available.
For the topic of disk space the solution can be quite easy: Don’t use local disks! Even if local SSDs are very performant in terms of IOPS, try to avoid the discussion and go for a SAN (Storage area network), which should be available already in an enterprise datacenter. (Of course you can also choose any different technology, which decouples the storage from the server in a strong way and performs well.) For AEM and TarMK a good SAN is sufficient to deliver decent performance (even if a local SSD improves this again).
I know that this statement can be problematic, as there are cases where IOPS are more important than the flexibility to easily add storage. Then your only chance is to take the SSDs and make sure, that you still have the chance to add more of them.
The benefit of a SAN is that you split the servers from their storage, and you can upgrade or extend them independently from each other. Adding more hard drives to a SAN is always possible, so you have hardly a limit in terms of available disk space per server. Attaching more disk space to a server is then a matter of minutes and can be done incrementally. This allows you also to attach disk space on demand instead of attaching the full amount of disk space on provisioning time (and consuming the full amount 2 years later).
And if you have servers and storage split up, it is also much easier to replace a server by another model with more capacity (RAM or CPU-wise), because you don’t need to move all data but rather just connect the storage to a different server.
So using a SAN does not free you up from delivering a good sizing, but it can soften the impacts of an insufficient sizing (mostly based on insufficient data), which is often the case on project kickoffs.