Traversing the content hierarchy

When you play around with websites, you often get a good feeling how much “engineering” work and thoughts are put into the site. Think of things like SQL injection and shell escape injection, which were a problem a few years ago. If you encounter today a site which is vulnerable to such a problem, it’s either a problem of the budget (which is a lame excuse because modern frameworks avoid such problems) or the skill of the developer. An up-to-date site isn’t affected anymore by such problems.

A problem, which is clearly visible, but often not known to application developers and architects is the content structure, which is exposed to the user by the URL. Consider the following small example for a simple content structure:

/content/brand/en/home

for the startpage of a CQ-based website. The “home”-handle is the startpage and is thus called via

http://www.example.com/content/brand/en/home.html

So, where’s the problem? Well, most templates provide a kind of HTML representation of their content. So let’s try

http://www.example.com/content/brand/en.html

maybe also a structure handle such as the language node (which is often just used to differentiate between languages) does also provide a HTML representation of its content; so it could just render its child nodes as a dotted list.

So, what’s then? Does it harm, if you reveal, that you provide beside english also chinese content? No, it doesn’t. Most times it doesn’t. But when you already have fresh content ready, but not yet linked? It would appear in such a list. Or if you have “hidden content”, functions which are known only to a small group of people? Things, which aren’t secured by authentication and authorization. Suddenly someone has found your private data and could make use of it.

The trash can for functionality is often a folder named “tools”; developers tend to place everything there which doesn’t fit well into any other category. So you can find there contact forms, search functionality and other stuff. So what happens if you call

http://www.example.com/content/brand/en/home/tools.html

Does it also your show unused/crappy/new functions, which aren’t used in the website, but are still there? Because for convenience some developer thought, it would be cool to have all tools listed without major hassle (1 bookmark instead of 10). Bad idea, you just showed all your available tools to someone, who shouldn’t see them.

So check you your site, that strucutre nodes, which are only used to structure your content, cannot be rendered at all or don’t reveal any information, which could be useful for an attacker. Either return an empty page or (suggested) return the HTTP statuscode “403” (access denied). Don’t reveal data when it isn’t necessary. A well-engineered site also takes care of such “attacks” and doesn’t reveal any data which could be of use for a potential attacker.

I’ve already done such tests on several CQ-based websites and found (beside some other things) a monitoring page (containing version information of used libraries) and also a hidden webspecial which was dedicated to a member of the webteam, heading for another location (hi, Katrin!). All of these information were public viewable (on a major corporate website!) just by playing around with path names and following then links.