“We have an urgent performance problem!”

“Customer situation is heating up because they have urgent performance problems in their production environment” … That’s something I heard quite often in my consulting career. And in many cases it was an outcry for help, because this problem did not turn out during tests, but most times in production environments.

I read once a nice quote: “Everyone has a performance test environment. But only a few have a production environment!” So true.

Is that really inevitable that performance issues occur? Given the number of cases I’ve seen, I am inclined to believe it. But it is not.

I think, that if all of a sudden a performance problem is put on priority 1, it has a history. If you are in a project team and one morning your project lead/PO tells you that the priorities have shifted, and that you need to push that little unknown item “Performance tuning” from the bottom of the backlog to the top of the list, you were aware of performance as topic. But other features were considered more important.

Or if your customer is starting to escalate with your account team that a really bad application (the one you developed for them) performance is affecting their business, then you often know, that this is not a new issue.

In both cases the priority of this problem just hit a certain level, that executives are getting concerned and start to escalate this topic, because it is hurting their and/or the company’s goals. Here you go, yesterday everything was fine, today it’s all screwed.

All of these problems have a history; performance problems rarely just rise out of nowhere, but they evolve under the radar of ignorance. As long as noone complains, who cares about performance? Features are more important. Until the complaints get loud enough they cannot be overheard anymore.

But when you get at the point where you need to care about performance, you are often in a very bad position. Because now you need information, tools and processes to improve that situation very fast. Because are in the focus everyone is looking to your problem (it’s yours!). “Deliver a fast improvement! Within this week!

But if you have never prepared for that situation, you lack all the necessary things for such an operation:

  • You don’t have KPIs to know what is “acceptable” performance. You just have everyone’s feeling “it’s slow!”
  • You don’t have the tools to measure the current performance. You just have logfiles (hopefully you have them …)
  • Is your system actually able to deliver that performance?
  • “Has anyone got the reports from the latest performance tests?”

That’s a hard situation and there is barely any other way than just to take a bunch of good people and start a surgery in production: Analyzing lot of data, making guesses, increasing logging, deploying hotfixes, and do all the things you already accumulated list on your list of “things which might improve performance” which you maintained over the last months.

But let’s be honest: This is chaos, unplanned, affecting a lot of other teams and people, and hurting careers. But does it really have to be that way?

No. Application performance is no magic, but must be managed. Performance should always be an important aspect in your project, and resources should be spent on it as you spend resources on testing. Otherwise performance is ignored 90% of the time, and in the remaining 10% you are escalating because of the lack of it.

Roles & Rights and complexity

An important aspect in every project is the roles and rights setup. Typically this runs side-by-side with all other discussions regarding content architecture and content reuse.

While in many projects the roles and rights setup can be implemented quite straight forward, there are cases where it gets complicated. In my experience this most often happens with companies which have a very strong separation of concerns, where a lot of departments with varying levels of access are supposed to have access to AEM. Where users should be able to modify pages, but not subpages; where they are supposed to change text, but not the images on the pages. Translators should be able to change the text, but not the structure. And many things more.

I am quite sure that you can implement everything with the ACL structure of AEM (well, nearly everything), but in complex cases it often comes with a price.

Performance

ACL evaluation can be costly in terms of performance, if a lot of ACLs needs to be checked; and especially if you have globally active wildcard ACLs. As every repository acecss runs through it, it can affect performance.

There is not hard limit in number of allowed ACLs, but whenever you build a complex ACL setup, you should check and validate its impact to the performance.

  • Wildcard ACLs can be time consuming, thus make them as specific as possible.
  • The ACL inheritance is likely to affect deep trees with lots of ACL nodes on higher-level nodes.

Maintainability

But the bigger issue is always the maintenance of these permissions. If the setup is not well documented, debugging a case of misguided permissions can be a daunting task, even if the root cause is as simple as someone being the member of the wrong group. Just imagine how hard it is for someone not familiar with the details of AEM permissions if the she needs to debug the situation. Especially if the original creator of this setup is no longer available to answer questions.

Some experiences I made of the last years:

  • Hiding the complexity of the site and system is good, but limiting the (read) view of a user only to the 10 pages she is supposed to manage is not required, it makes the setup overly complex without providing real value.
  • Limiting write access: I agree that not everyone should be able to modify every page. But limiting write access only to small parts of a page is often too much, because then the number of ACLs are going to explode.
  • Trust your users! But implement an approval process, which every activation needs to go through. And use the versions to restore an older version if something went wrong. Instead of locking down each and every individual piece (and then you still need the approval process …)
  • Educate and train your users! That’s one of the best investments you can make if you give your users all the training and background to make the best of the platform you provide to them. Then you can also avoid to lock down the environment for untrained users which are supposed to use the system.

Thus my advice to everyone who wants (or needs) to implement a complex permission setup: Is this complexity really required? Because this complexity is rarely hidden, but in the end something the project team will always hand-over it to the business.

Ways to achieve content reuse in AEM

Whenever an AEM project starts, you have a few important decisions to make. I already wrote about content architeture (here and here) and its importance to a succesful project and an efficient content development and maintenance process. A part of this content architecture discussion is the aspect of content reuse.

Content reuse happens on every AEM project, often it plays a central role. And because requirements are so different, there are many ways to achieve content reuse. In this blog post I want to outline some prominent ways you can use to reuse content in AEM. Each one comes with some unique properties, thus pay attention to them.

I identified 2 main concepts of content reuse: Reuse by copy and Reuse by reference.

Reuse by copy

The AEM multisite manager (MSM) is probably the most prominent approach for content reuse in AEM. It’s been part of the product for a long time and therefor a lot of people know it (even when you just have started with AEM, you might came across its idioms). It’s an approach which creates independent copies of the source, and helps you to keep these copies (“livecopies”) in sync with the original version (“blueprint”) . On the other hand side you still can work with the copies as you like, that means modify them, create and delete parts of the pages or even complete pages. With the help of the MSM you can always get back to the original state, or a change on the blueprint can be propagated to all livecopies (including conflict handling). So you can could call this approach “managed copy”.

The MSM is a powerful tool, but comes with its own set of complexity and error cases; you and your users should understand how it works and what situations can arise out of it. It also has performance implications, as copies are created; also rolling out changes on the blueprint to livecopies can be complex and consume quite some server resources. If you don’t need have the requirement to modify the copies, the MSM is the wrong approach for you!

Unlike the MSM the language copy approach just creates simple copies; and when these copies have been created there is no relationship anymore between the source and the target of the language copy. It’s an “un-managed copy”. Personally I don’t see much use in it in a standalone way (if used as part of a translation workflow, the situation is different).

Reuse by reference

Reuse by reference is a different approach. It does not duplicate content, but just adds references and then the reference target is injected or displayed. Thus a reference will always display the same content as the reference target, deviations and modifications are not possible. Referencing larger amount of content (beyond the scope of parts of a single page) can be problematic and hard to manage, especially if these references are not marked explicitly as such.

The main benefit of reuse by reference is that any change to the reference target is picked up immediately and reflected in the references; and that the performance impact is negligible. Also the consistency of the display of the reference with the reference target is guaranteed (when caching effects are ignored).

This approach is often used for page elements, which have to be consistent all over a site, for example for page headers of footers. But also the DAM is used in this way, even if you don’t embed the asset itself into the page, but rather just add a reference to it into the page).

If you implement reuse by reference, you always have to think about dispatcher cache invalidation, as in many cases a change to a reference target it not propagated to all references, thus the dispatcher will not know about it. You often have to take care of that by yourself.

Having said that, what are the approaches in AEM to implement reuse by reference?


Do it on your own: In standard page rendering scripts you already do includes, typically of child nodes inside the page itself. But you can also include nodes from different parts of the repository, no problem. That’s probably the simplest approach and widely used.

Another approach are Content Fragments and Experience Fragments. They are more sophisticated approaches, and also come with proper support in the authoring interface, plus components to embed them. That makes it much easier to use and start with, and it also offers some nice features on top like variants. But from a conceptual point if view it’s still a reference.

A special form of reuse by reference is “reuse by inheritance“. Typically it is implement by components like the iparsys or (when you code your own components) by using the InheritanceValueMap of Sling. In this case the reference target is always the parent (page/node). This approach is helpful when you want to inherit content down the tree (e.g from the homepage of the site to all individual pages); with the iparsys it’s hte content of a parsys, with the InheritanceValueMap it’s properties.

What approach should I choose?

The big differentiator of the reuse by copy and reuse by reference is the question if reused content should be adapted or changed at the location where it should be reused. As soon as you need to have the requirement “I would like to change the content provided to me”, the you need to have reuse by copy. And in AEM this normally mans “MSM”. Because content is not created once, but needs to be maintained and updated. At scale MSM is the best way to do it. But if you don’t have that requirement, use reuse by reference.
You might even use both approaches, “reuse by copy” to manage the reuse of content over different sites, and “reuse by reference” for content within a site.

Your friendly business consultant is a can help you find out which reuse strategy makes sense for your requirements.

Why a good business consultant can save you more money than a good AEM architect

In the years working in the consulting business for Adobe I joined many projects. Some only temporary to support on some special issues, others in a broader context; but always it was with a focus on the technology, on implementation, architecture and infrastructure. And with this background, I always tried to help my customers and projects and use the AEM (or CQ5) technology in the right way; to spend their money wisely and cost-effective.

But: according to my own standards I often failed to do so.

Not because I did not know the product. Not because the customer was not listening to me. And not because I was not able to give advice. But because I was not sitting at the right table when some important decisions were made. Because I was sitting with infrastructure teams to build a hardware sizing. I was working with customers architects to understand the constraints under which we have to design and operate AEM. But I did not have enough time to work with the business, understand their needs and the ways they thought AEM can help them. I was not there when they decided on content architecture, components and templates, business processes and how the editors should work. Because my focus is technology, so I talk with the technology people in the first place.

And why is that important for all of us? Because if you don’t know AEM and you don’t have any guidance what it can do and how the existing features can be used, it is very likely that you start building everything on your own. In that case you never use the advanced features of AEM, but you start based on that what you know or what was demo-ed to you by presales people. If you don’t have a person, a trusted advisor, someone knowing AEM very well (and also all the features added to it in the last years!), you start to invest a lot of money to build things which already exist. Too much money.

Let me introduce the AEM business consultant. Someone very familiar with philosophy of AEM. Someone which has an up-to-date knowledge of AEM. Someone which is not a developer, but can map your requirements into AEM features and knows what is possible out-of-the-box and where an existing feature needs an extension on top. And also someone which helps to leverage other features of AEM you paid for.

A business consultant will attend meetings with business stakeholders and with the users which are going to use AEM. A business consultant will help you telling the technologists what they should implement (and what not). And a business consultant is a good partner to the architect and together they can identify the gaps which have to covered by custom development and they frame requirements in a way that they can be covered by out-of-the-box features.

For me often working in an architect role it makes me job much easier if I can focus on the technical aspects. If I know that I have a partner in crime at my side, which makes sure that business users understand the philosophy of AEM; if the business people know what they can get easily; and if they understand what requirement is harder to implement (although it sounds soo easy). Because once an idea has developed in the mind of the business stakeholders, it’s much harder to change it (if that’s necessary). If you as an architect see the requirement before its implementation at all.

Because the worst case scenario is that you are implementing some project features and spend a lot of money on it. And in hindsight it turns out that it could have implemented it with 20% of the time if someone would have told the business and developers, that they just re-invented content fragments (while using AEM 6.3 which has that feature built-in). But because noone told them that, you have designed and built a lot of your application on this custom implementation; and replacing it is expensive. But with a business consultant in the project she would have identified that within the first sessions and helped you to use this out-of-the-box feature.

So a business consultant helps you to spend your money on the right things; and an architect ensures that it is built the right way. Two sides of the same medal.

Why JCR search is not suited for site search

Many, especially larger websites have an integrated search functionality, which let’s users directly find content of the site without using external search engines like Google or Bing. If properly implemented and used, it can be a tremendous help to get visitors directly to the information they need and want.

I’ve get questions in the past how one can implement such a site search using JCR queries. And at least in the last years my answer always was: don’t use JCR search for that. Let me elaborate on that.

JCR queries are querying the repository, but not the website

With JCR query you are querying the repository, but you don’t query the website. That sounds a bit strange, because the website lives in the repository. This is true, but in reality the website is much more. A rendered page consists of data stored below a cq:Page node. And more data from other parts of the repository. For example you pull in assets to a page, and you also add some metadata from the assets into the rendered page. You add references to other pages and include data from there.

This means, that the rendered page contains much meaningful and relevant information which can and should be leveraged from a search function to deliver the best results. And this data is not part of the cq:page structure in the repository.

Or to put in other words: You do SEO optimization for your page to deliver the most relevant results hoping that its rank on Google gets higher and more relevant for users searching for specific terms. Do you really think that your own integrated site search should deliver less relevant data for the same search?

As a site owner I do not expect that Google delivers for a certain search keyword combination a page A as highest ranked page on my site, but my internal search a different page B which is clearly less relevant for that keywords.

That means, that you should provide your site search the same information and metadata as to Google. And for JCR queries you only have the repository structure and the information stored there, and you should not optimize this as well for relevant search results, but the JCR repository structure aims for different goals (like performance, maintainability, evolvability and others).

JCR queries implement functionality not needed for site search

The JCR query implementation needs to take some concepts into account, which are often not relevant for site search, but which are quite expensive. Just to name a few:

  • Nodetype inheritance and Mixins:  On every search there are checks for nodetypes, sometimes with the need to traverse the hierarchy and check the mixin relationships. That’s overhead.
  • ACL checks: Every search result needs to be acl checked before returning it, which can be huge overhead, especially if in the most simple case all (relevant) content is public and there should be no need to do such checks at all.
  • And probably much more.

JCR is not good at features which I would expect from a site search

  • Performance is not always what I expect from a site search.
  • With the current Oak implementation you should test every query if it’s covered by indexes; as site search queries are often unpredictable (especially if you do not only allow for a single search term, but also want to include wild cards etc) you’ll always have the risk that something unexpected happens. And then it’s not only about bad performance if your query is not covered by a matching index. It can also be that you deliver the wrong search results (or no search results at all).
  • Changing index definitions (even adding synonyms or stopwords) are necessarily an admin task, and if improperly done, they impact the overall system. Not to mention the need of reindexing 😦

From my point of view if you cannot solely rely on external search engines (Google, Bing, DuckDuckGo, …) you should implement your site search not on JCR queries. It often causes more trouble than adding a dedicated Solr instance which crawling your site and which is embedded into your site to deliver the search results.  You can take this HelpX article as starting point how to integrate Solr into your site. But of course any other search engine is possible as well.

Writing unittests for AEM (part 3): Mocking resources

I introduced SlingMocks in the recent blog posts (part 1, part 2), and till now we only covered some basics. Let’s dig deeper now and use one of its coolest features: Mocking resources.

Well, to be honest, we don’t mock resources. We do something much better: We build an in-memory structure which represent sling resources. We can even map that to an in-memory JCR-repository!

For this blog post I created a very simple Sling model, which supposed to build a classic navigation. For simplicity I omitted nearly all of the logic which makes up a production-ready navigation component, and want to add just the pages below the site root to it.

The relevant files in my github repo:

The interesting part is the unit test. This time I used the AemMocks library from wcm.io, because it provides all the magic of the SlingMocks, plus some AEM specific objects I would like to use. As the AemContext class inherits directly from the SlingContext class, we can just use it as a drop-in replacement.

Loading the test data from a JSON flle in the test resources

The setup method is very simple: It loads a JSON structure and populates a in-memory resource tree with it (adds everything below a resource “/content”). Using this approach there’s no longer the need to mock resources, properties/value maps, and mock the relation between these. This is all done by the SlingMocks/AemMocks framework.

Thus a testcase looks like this:

No boilerplate code anymore

This simple example shows how easy it can be to write unit tests. The test code is very concise and easy to read and understand. In my opinion the biggest issue when writing unit tests now is creating the test content 🙂

Some notes to the test content:

  • My personal style is to put the test content into the same structure than the java packages. This makes it very easy to find the testcontent and keeps the testcontent and the unittest together.
  • You can write the JSON manually. But you can also use CRXDE Lite to create some structure in your local AEM instance and then export it with “http://localhost:4502/content/myproject/testcontent.tidy.-1.json ” and paste it into the JSON file. I find this way much more convenient, especially if you already have most of the test content ready. But if you do that, please clean up the unnecessary properties in the test data. They are polluting the test content and make it harder to understand.

Writing unit tests for AEM (part 2): Maven Setup

In the previous post I showed you how easy it is to start using SlingContext. If you start to use this approach in your own project and just copy the code, your Maven build is likely to fail with messages like this:

[ERROR] testActivate_unconfiguredParams(de.joerghoh.cqdump.samples.unittest.ReplicationServletTest)  Time elapsed: 0.01 s  <<< ERROR!
org.apache.sling.testing.mock.osgi.NoScrMetadataException: No OSGi SCR metadata found for class de.joerghoh.cqdump.samples.unittest.ReplicationServlet at org.apache.sling.testing.mock.osgi.OsgiServiceUtil.injectServices(OsgiServiceUtil.java:381)

The problem here is not the use of the SlingMock library itself, but rather the fact that we use SlingMocks to test code which uses OSGI annotations. The fix itself is quite straight-forward: We need to create OSGI metadata also for the unittests and not only for the bundling (SlingMock is reading these metadata for the test execution).

That’s the reason why there’s the need to have a dedicated execution definition for the maven-bundle-plugin:

https://github.com/joerghoh/unittest-demos/blob/master/core/pom.xml#L35

Also you need to instruct the maven-bundle-plugin to actually export the generated metadata to the filesystem using the “<exportScr> statements; this should be the default in my opinion, but you need to specify it explicitly. Also don’t forget to add the “_dsannotation” and “_metatypeannotations” statements to its instruction section:

https://github.com/joerghoh/unittest-demos/blob/master/core/pom.xml#L43

And even then it will fail, if you don’t upgrade the maven-bundle-plugin to a version later than 4.0:

https://github.com/joerghoh/unittest-demos/blob/master/pom.xml

Ok, if you adapted your POMs in this way, your SlingMock based unittests for OSGI r6-based services should run fine. Now you can start exploring more features of SlingMocks.

Writing unit tests for AEM — using SlingMocks

Over the course of the last years the tooling for AEM development improved a lot. While some years ago there was hardly an IDE integration available, today we have dedicated tools for Eclipse and IntelliJ (https://helpx.adobe.com/experience-manager/6-4/sites/developing/using/aem-eclipse.html). Also packaging and validation support was poor, today we have the opensourced filevault and tools like oakpal (thanks Marc!)

But with SlingMocks we also have much better unittest tooling (thanks a lot to Stefan Seifert and the Sling people), so we much more and better/easier tooling than just Mockito and Powermock. SlingContext (and its extension AemContext) allows you create unit tests quite easily. Using them can help you get rid of mocking Sling Resources, repo access and many things more.

On top of that, SlingMock can easily work with the new OSGI r6 annotations, which allow you to define OSGI properties in Pojos. Mocking the @ObjectClassDefinition classes isn’t that easy, because they are essentially annotations …

To illustrate that, I have created a minimal demo with a servlet and testcases for it (source code at Github). Basically the functionality of the class itself is not relevant for this article, but we want to focus on aspects how you utilize the frameworks best to avoid boilerplate code.

The code is quite simple, but to make unittesting a bit more challenging, it uses the new OSGI r6 annotations for OSGI configuration (using the @Designate and the @ObjectClassDefinition annotations) plus a referenced service.

https://github.com/joerghoh/unittest-demos/blob/master/core/src/main/java/de/joerghoh/cqdump/samples/unittest/ReplicationServlet.java#L29

If you try to mock the ReplicationServlet.Config class the naive way, you will find out, that it’s an annotation, which is referenced in the activate() method. I always failed to mock it somehow, so I switched gear and started to use SlingMock for it. (I don’t want to say that it is not possible, but it’s definitly not straight-foward, and in my opinion writing unit-tests should be straight forward, otherwise they are not created at all.)

With SlingMocks the approach changes. I am not required to create mocks, but SlingMocks provides a mocked OSGI runtime we can use. That means, that we create the OSGI parameters as a map to tell the SlingContext object to register our service with these parameters (line 59).

https://github.com/joerghoh/unittest-demos/blob/master/core/src/main/java/de/joerghoh/cqdump/samples/unittest/ReplicationServlet.java

Because SlingContext implements quite a bit of the OSGI semantics, it also requires that all referenced services are available (if these are static references). Therefor I use Mockito to mock the Replicator and I register the mock to provide the Replicator service. In realworld I could verify the interactions of my servlet with that mock.

This basic example illustrates how you can use SlingMocks to avoid a lot of mocking and stubbing. This example does not utilize the full power of SlingMocks yet, we are just scratching at the surface. But we already have some benefit : If you switch from SCR annotations to OSGI annotations, your SlingMock unittests don’t need any change, because it provides an OSGI-like environment, and there the way how metatypes are generated and injected are abstracted away.

How does Sling resolve an AEM Page to the correct resource type?

On the AEM forum there’s an interesting question:

Does this mean that all requests to the page node are internally redirected to _jcr_content for HTML and XML calls but not for other extensions ?

Without answering the question right here, it raises the question: How is a cq:Page node actually resolved? Or how does Sling know, that it should resolve a call to a cq:Page node to the resourcetype specified in its jcr:content node? Especially since the “cq:Page” does not have a “sling:resourceType” property at all?

A minimal page can look like this (JSON dump):

{
  "jcr:primaryType": "cq:Page",
  "jcr:createdBy": "admin",
  "jcr:created": "Mon Dec 03 2018 19:09:44 GMT+0100",
  "jcr:content": {
    "jcr:primaryType": "cq:PageContent",
    "jcr:createdBy": "admin",
    "cq:redirectTarget": "/content/we-retail/us/en",
    "jcr:created": "Mon Dec 03 2018 19:09:44 GMT+0100",
    "cq:lastModified": "Tue Feb 09 2016 00:05:48 GMT+0100",
    "sling:resourceType": "weretail/components/structure/page",
    "cq:allowedTemplates": ["/conf/we-retail/settings/wcm/templates/.*"]
    }
}

A good tool to understand how a page is rendered is the “Recent Requests” tool available in the OSGI webconsole (/system/console/requests).

When I request /content/we-retail.html and check this request in the recent requests tool, I get this result (reduced to the relevant lines):

    1967 TIMER_END{65,ResourceResolution} URI=/content/we-retail.html resolves to Resource=JcrNodeResource, type=cq:Page, superType=null, path=/content/we-retail
1974 LOG Resource Path Info: SlingRequestPathInfo: path='/content/we-retail', selectorString='null', extension='html', suffix='null'
   1974 TIMER_START{ServletResolution}
   1976 TIMER_START{resolveServlet(/content/we-retail)}
   1990 TIMER_END{13,resolveServlet(/content/we-retail)} Using servlet /libs/cq/Page/Page.jsp
   1991 TIMER_END{17,ServletResolution} URI=/content/we-retail.html handled by Servlet=/libs/cq/Page/Page.jsp

The interesting part is here, that /content/we-retail resource has a type (= resource type) of “cq:Page”, and when we look down in the snippet, we see that it is resolved to /libs/cq/Page/Page.jsp; that means the resource type is actually “cq/Page”.

And yes, in case no resource type property is available, Sling’s first fallback strategy is to try the JCR nodetype as a resource type. And in case this fails as well, it goes back to the default servlets.

OK, now we have a point to investigate. The /libs/cq/Page/Page.jsp is very simple:

<%@include
file="proxy.jsp" %>

And in the proxy.jsp there’s a call to the RequestDispatcher to include the jcr:content resource; and from there one the standard resolution starts which we all are used to.

And to get back to the forum question: If you look around in /libs/cq/Page, you can find more scripts to deal with extensions and also some selectors. And they all include only the call to the proxy.jsp. If your selector & extension combination do not work as expected, you might want to add an overlay there.

OSGi DS & Metatype & SCR properties

When I wrote the last blog post on migrating to OSGi annotations, I already mentioned that the property annotations you used to use with SCR annotations cannot be migrated 1:1, but instead you have to decide if you add it the OCD or if you add them as properties to the @Component annotation.

I was remembered of that when I worked on the fixing the process label properties for the workflow steps contained in ACS AEM Commons (PR 1645). And the more I think about it, the more I get the impression that this might be causing some confusion in the adoption of the OSGI annotations.

Excursion to the OSGi specification

First of all, in OSGi there 2 specifications, which are important in this case: Declarative Services (DS, chapter 112 in the OSGI r6 enterprise specification, sometimes also referenced as Service Component Model) and the Metatype Services (chapter 105 in the OSGI r6 enterprise specification). See the OSGI website for download (unfortunately there is no HTML version available for r6, only PDFs).

Declarative Services deals with the services and components, their relations and the required things around it. Quoting from the spec (112.1):

The service component model uses a declarative model for publishing, finding and binding to OSGi services. This model simplifies the task of authoring OSGi services by performing the work of registering the service and handling service dependencies. This minimizes the amount of code a programmer has to write; it also allows service components to be loaded only when they are needed.

Metatype Services care about configuration of services. Quoting from chapter 105.1

The Metatype specification defines interfaces that allow bundle developers to describe attribute types in a computer readable form using so-called metadata. The purpose of this specification is to allow services to specify the type information of data that they can use as arguments. The data is based on attributes, which are key/value pairs like properties.

Ok, how does this relate to the @Property annotation in the Felix SCR annotations? I would say, that this annotation cannot be clearly attributed to either DS or Metatype, but it served both.

You could add @Properties as annotation to class, or you could add it as with an @Property to a field. You could add a property “metatype=true” to the annotation, and then it appeared in the OSGI console (then it was a “real” metatype property in the sense of the Metatype specification).

But in either way, all the properties were provided through the ServiceContext.getProperties() method; that means, in reality it never really made a difference how you defined the property, if a the class level or to properties, if you added the metatype=true property or not. That was nice and in most times also very convenient.

This changed with the OSGI annotations. Because now the properties are described in a class annotated with the @ObjectClassDefinition annotation. Type-safe and named. But here it’s clearly a Metatype thing (it’s configuration), and it cannot be used with the Declarative Service (the services and components thing) in parallel. Now you have to make the decision: Is it a configuration item (something I use in the code) or is it a property which influences the component itself?

As an example, with SCR annotations you could write

@Component @Service
@Properties({
@Property(name="sling.servlets.resourcetypes",value="project/components/header"),
@Property(name="sling.servlets.selector",value="foo")
})
public class HeaderServlet implements SlingSafeMethodsServet { ...

Now, as these properties were visible via metatype as well, you could also overwrite them using some OSGI configuration and register the servlet on a different selector just by configuration. Or you could read these properties from the ServiceContext. That was not a problem (and hopefully noone ever really used it …).

With OSGI annotations this is no longer possible. You have configuration and component properties. You can change the configuration, but not the component properties any more.

What does that mean for you?

Mostly, don’t change all properties blindly to properties of the @ObjectClassDefinition object. For example the label of a workflow step is not configuration, but rather a property. That means there you should use something like this:

@Component(properties= {
"process.label=My process label"
}
public class MyWorkflowProcess implements WorkflowProcess {...

Disclaimer: I am not an OSGI expert, this is just my impression from dealing with that stuff a lot. Carsten, David, feel free to to correct me 🙂