IE 7 Cripples IDN

Microsoft’s policy about how IE 7 will handle IDNs has changed slightly in beta 3, but unfortunately as it stands will still have a serious detrimental effect on IDN take-up. Here’s why.

IE 7 displays all IDN domain names as punycode (e.g. http://www.xn--caf-hya.com), unless the copy of IE has the “language” of that domain name configured as one of its Accept Languages.[0] If it displays the ugly and indecipherable punycode, it also presents a yellow security bar, saying “We can’t display this domain name; click for options”, where presumably the user might have the option of adding to the whitelist whatever language IE thinks the domain name is in.

This will cripple IDNs in almost any international market, simply because domain owners are not going to want an unknown percentage of users visiting their domain to have that horrible user experience. You are a German company – will you choose an IDN domain name containing a ß as your primary domain name if you know you might one day want to expand into the European market and sell goods outside Germany? And that almost all your European customers will have to go through this?

IDNs might be perhaps used when the site owner can guarantee that all their visitors will have a particular language configured – but how common is that? Even aside from the situation above, this is the “World Wide” Web, and people use the browser of a friend, or an Internet café. The browser doesn’t really know what languages its user speaks, and it’s unlikely that the user will take time to tell it. When was the last time you configured the Acceptable Languages in a browser you were using? And if you did, when you stopped using that browser, did you remove them and reset the setting?

The sad thing is that this measure by itself doesn’t improve security. A particular domain name is either dangerous or it isn’t – that is, it’s either a homograph of another domain registered to a different person, or it isn’t. If the domain name is a homograph then all those people who, by default or by configuration, have that language configured are at risk. And if it’s not a homograph, why not display it to everyone from the start?

The other measure IE 7 is taking, which is to forbid most script mixing, will improve security. But here they have gone the other way – this measure is too draconian. Script-mixing by itself is not dangerous, as long as your registry is on the ball.

Firefox has a system based on a whitelist of TLDs whose registries have sensible anti-homograph policies. Only they can tell if a domain name is dangerous or not; browsers just don’t have enough information. Our policy allows many more safe domain names.

Unfortunately, as domain owners will only pick names which work everywhere, IE 7 is further restricting the set of names that can be used in practice. Having worked for a long time on making IDN safe and usable in browsers, it’s very sad to see its uptake stunted in this way. :-( I hope they change their minds and remove that first check, but I fear it’s too late.

[0] There will also be a host of problems caused by the fact that domain names use characters from particular scripts, or perhaps multiple scripts, and IE has a list of languages. Languages and scripts have a really complex relationship – in which language is the letter é? What Accept Language do I have to have configured to correctly view www.café.com? I haven’t covered this further because it’s secondary to the even bigger problem mentioned above.

21 thoughts on “IE 7 Cripples IDN

  1. “You are a German company – will you choose an IDN domain name containing a � as your primary domain name if you know you might one day want to expand into the European market and sell goods outside Germany?”

    I’m sure they’ll register both the domain containing the � and another using SS. Their international customers won’t be able to type the German name (if they see it in print for example) and the site probably already uses the “Accept Languages” header to send customers to a more appropriate version of the page anyway.

    Personally, I’d rather have technical measures to back up a registry policy. The time before a registry reacts to a complaint about a domain name could be ample for an effective phish. IE’s policy seems like a good compromise: people see characters they recognise and use, and not others.

    Firefox’s policy may be better for the health of IDN — unless it continues to earn it a reputation of being a needless security risk and an effective way of segmenting the internet — but does less to protect users. Apart from allowing characters people couldn’t even repeat on paper if asked, the browser doesn’t even give me the option to disable IDN in the UI (as IE does). After the last Mozilla IDN goof, I determined that support for it was of too little value to justify any risk as I can only read English and French.

    Your efforts for Firefox are appreciated, but I don’t find your argument that I should see characters I can’t read compelling. Doesn’t your suggestion that whether a name is good or bad extend to the mixed-scripts case too? Given that, why do you claim that the mixed-scripts decision does improve security while disclaiming any responsibility for presenting ‘bad’ names, instead heaping it on the registry? At least be consistent.

  2. The time before a registry reacts to a complaint about a domain name could be ample for an effective phish.

    The policy is not about how fast they react to a complaint; it’s aimed at not allowing two homographic domains to be registered to two different people in the first place. Read the documents :-) Registries can achieve this either by not having any homographs in their allowed character set; or by bundling or blocking homographic domains. We have examples of all three in the whitelist.

    One can work out ahead of time which characters are homographic, and do extremely simple tests to make sure homographic domains are bundled or blocked.

    Apart from allowing characters people couldn’t even repeat on paper if asked, the browser doesn’t even give me the option to disable IDN in the UI (as IE does). After the last Mozilla IDN goof, I determined that support for it was of too little value to justify any risk as I can only read English and French.

    • The characters you can repeat on paper are not necessarily the same as those someone else can. And remember, all registries we permit now have character whitelists, so things like ❤ are right out.
    • Of course we don’t have a UI option to disable IDN. We don’t have a UI option to disable the display of Chinese in the content area; why have one for the URL bar?
    • What on earth do you mean by “Mozilla IDN goof”? How exactly was the homograph problem our fault? It was inherent in the standard; we’ve had to move heaven and earth to work around the problems.
  3. My information is from the IE blogpost linked above. Perhaps I’ve misinterpreted it, or perhaps it’s more complicated than we think – maybe there’s a list of “always safe” characters, biased towards the European ones.

  4. Yeah, indeed, it seems to be a �script� thing.
    I tried the URL listed in the IE blog posting – http://ايكيا.com – and I got the punycode – http://xn--mgba7f1ab.com/ – and the warning.
    Looks like the Icelandic ��� is considered OK because it’s �latin�.

    (Note that the URL is punycoded in Firefox as well, though – after all, .com is not on the whitelist.)

    – Michael

  5. The characters you can repeat on paper are not necessarily the same as those someone else can.

    Naturally; that’s why IE’s approach of supporting only scripts associated with configured languages makes sense to me.

    We don’t have a UI option to disable the display of Chinese in the content area; why have one for the URL bar?

    The content area and URL bar are very different, in the way that they’re used, the trust users place in them, and so forth. That’s why browser vendors are considering these measures. That’s why bugs which cause the browser to display the wrong URL are considered security issues. There are plenty of historical advisories for both Firefox and IE that show as much.

    Again: As an individual user, IDN offers no benefit to me and there are risks associated with it. Therefore, I want to turn decoding of IDN names off. Remember http://secunia.com/advisories/16764? It’s not just phishing we’re talking about.

    What on earth do you mean by “Mozilla IDN goof”? How exactly was the homograph problem our fault? It was inherent in the standard

    On earth, homograph-based phishing was an obvious possibility that was known about when that standard was being drafted. I consider shipping a browser without any technical measure to mitigate those risks to be a serious mistake. See “Small-minded Mozilla mocked by wider world” at http://www.theregister.co.uk/2005/02/25/mozilla_nixes_idns/. Mozilla’s approach is now to trust certain registrars on my behalf, instead of trusting them all. As a pessimist, I wonder whether that’s enough. I welcome IE’s more stringent approach.

    At the very least, there’s certainly been a lot of bad press about IDNs and Mozilla, and for the corporation to disclaim all responsibility for that is foolish. It certainly doesn’t instil confidence to see you hide behind the spec after exposing your users to risk; I’d rather you appeared to learn from the experience.

    Of course, both browsers will get more general measures to protect against some of these problems in their next version. That’s a good thing.

  6. IE 7 displays all IDN domain names as punycode (e.g. http://www.xn--caf-hya.com), unless the copy of IE has the “language” of that domain name configured as one of its Accept Languages

    Apart from everything you’ve said, it seems this could have the effect of readers then getting webpages in languages they don’t understand.

    In your German example I might follow IE’s advice and add German to my list of acceptable langauges, because I can cope with German characters in names and working out how to pronounce them. But from that point on IE is happily telling every website that I visit that I’m competent at reading content in German, which I’m not; if a site served me German content I almost certainly wouldn’t be able to understand it.

  7. Michael has a good point leading to IKEA’s site. That is indeed punycoded in FF2 beta as well as IE7, but the only difference is the warning in IE as opposed to FF.

    The point I was going to make is that there are certain characters, like the Swedish � that would be impossible to type in an internet cafe in the rest of the world. I couldn’t see someone in Spain trying to enter http://www.ume�.se in a website. They’d be likely to use http://www.umea.se (which the former is redirected to). So, in this case, James S has a point.

  8. Although I agree it’s unlikely that someone would manually enter http://www.ume�.se manually when in an internet caf� in Spain, they could be following a link in an email message, or an on-line bookmark, etc.
    In any case, as I noted earlier with http://www.ve�ur.is , such URLs work fine and don’t seem to be punicoded in IE7b3.

    This *might* be different in an internet caf� in China, though. If the IE blog post is OK, Chinese language IE7 will only accept Chinese scripts plus ASCII.

    – Michael

  9. @ James S

    If I’m Greek and I go to an internet cafe in Rome, I probably know how to change the Windows keyboard settings so I can easily type Greek characters, or I could click a link with Greek characters from an e-mail I got.
    I don’t want to get a warning about this until I change the language settings, sometimes this even isn’t allowed in internet cafes.

    Also, your point about getting characters you can’t read doesn’t make sense. Not knowing a language doesn’t imply not being able to read the characters. I can read Greek, but I understand the language only just enough to know what μακαρωνη μπολονηση (spelling mistake?) is. So I, obviously, won’t put Greek in my Accept-Languages header. But if, for example, someone from Greece wants me to go to a site with a Greek and an English version, he shouldn’t have to find the English domain name (which probably exists), I can read the Greek domain just fine.
    Internet Explorer shouldn’t give a warning about this. The warning is even a security risk if it looks a lot like a phishing warning, because if I want to revisit the site and I type a mistake in the domain name, even if there is a phishing warning, I will just ignore it, because after a while I will be used to warnings from that domain. And typing a mistake in a domain name is particularly an issue in different alphabets, as I will be likely to make mistakes between ι and η, or ω and ο.

  10. You guys are trapped inside a racist and ethnocentric, Latin-centric, world view, and by not supporting real IDN display, are doing the rest of the world a disservice. We don’t care about your phishing problem and non-Latin script phobias. Get over it. We use mixed Latin + Cyrillic in daily life. How do you thing people write DVD in Russian? They use Latin. OK. That’s mixed script. Real mixed script. Not stuff you guys call phishing.

    Anyways, Mozilla doesn’t support IDNs at all, it always displays punycode. So why are you blaming IE7? Get with it and start displaying IDNs an not “code”.

  11. Smirnoff: There is no rule that says a domain name has to represent everything you can possibly say in a particular language. Yes, you can write “DVD” as “DVD” in Russian, but that doesn’t mean that there’s some magic rule which says domain names like русский-dvd.ru have to be permitted. It’s a matter of balancing security with allowing as much of what people want to do as possible.

    Firefox certainly supports IDN. I don’t know if the Mozilla Suite or Seamonkey do; I was under the impression that they did but, if not, you would need to get in touch with the groups in charge of those products.

  12. Firefox certainly supports IDN.

    I should have been more specific. Firefox supports it for whitelisted TLDs. The .ru registry has not yet applied to join the list. If you want IDN in .ru to work, ask them to do so.

  13. Firefox supports Unicode IDN domain resolution? How does one enable that? When I enter Native character domains into the FF browser address field, it resolves the IDN only to Punycode, not the language correct Unicode.

    Thanks for your help.

  14. IDN ready: Exactly which domains are you using? Firefox only supports it for a whitelist of TLDs with sensible anti-homograph policies.

  15. Firefox does not support any unicode IDN. Put in any IDN domain and tell me what resolves. It resolves Punycode. Try it.

    Come on this is well known.

  16. IDN ready: .com is not on Firefox’s whitelist – because they have not asked to be. If you want IDN domains to work in Firefox, get Verisign to apply to be whitelisted.

  17. I think I will just use a different browser that supports IDN. It’s not my job to contact Verisign or anyone else.

  18. Still no support, months after many people have complained about this.

    You are still trapped in your bigot attitude towards people who use different scripts.

    I don’t know who Verisign is, but I tell you, .com is default domain that people use, because .ru doesn’t support IDN.

    So, please turn on IDN.com for all languages now.