Microsoft’s policy about how IE 7 will handle IDNs has changed slightly in beta 3, but unfortunately as it stands will still have a serious detrimental effect on IDN take-up. Here’s why.
IE 7 displays all IDN domain names as punycode (e.g. http://www.xn--caf-hya.com), unless the copy of IE has the “language” of that domain name configured as one of its Accept Languages.[0] If it displays the ugly and indecipherable punycode, it also presents a yellow security bar, saying “We can’t display this domain name; click for options”, where presumably the user might have the option of adding to the whitelist whatever language IE thinks the domain name is in.
This will cripple IDNs in almost any international market, simply because domain owners are not going to want an unknown percentage of users visiting their domain to have that horrible user experience. You are a German company – will you choose an IDN domain name containing a ß as your primary domain name if you know you might one day want to expand into the European market and sell goods outside Germany? And that almost all your European customers will have to go through this?
IDNs might be perhaps used when the site owner can guarantee that all their visitors will have a particular language configured – but how common is that? Even aside from the situation above, this is the “World Wide” Web, and people use the browser of a friend, or an Internet café. The browser doesn’t really know what languages its user speaks, and it’s unlikely that the user will take time to tell it. When was the last time you configured the Acceptable Languages in a browser you were using? And if you did, when you stopped using that browser, did you remove them and reset the setting?
The sad thing is that this measure by itself doesn’t improve security. A particular domain name is either dangerous or it isn’t – that is, it’s either a homograph of another domain registered to a different person, or it isn’t. If the domain name is a homograph then all those people who, by default or by configuration, have that language configured are at risk. And if it’s not a homograph, why not display it to everyone from the start?
The other measure IE 7 is taking, which is to forbid most script mixing, will improve security. But here they have gone the other way – this measure is too draconian. Script-mixing by itself is not dangerous, as long as your registry is on the ball.
Firefox has a system based on a whitelist of TLDs whose registries have sensible anti-homograph policies. Only they can tell if a domain name is dangerous or not; browsers just don’t have enough information. Our policy allows many more safe domain names.
Unfortunately, as domain owners will only pick names which work everywhere, IE 7 is further restricting the set of names that can be used in practice. Having worked for a long time on making IDN safe and usable in browsers, it’s very sad to see its uptake stunted in this way. :-( I hope they change their minds and remove that first check, but I fear it’s too late.
[0] There will also be a host of problems caused by the fact that domain names use characters from particular scripts, or perhaps multiple scripts, and IE has a list of languages. Languages and scripts have a really complex relationship – in which language is the letter é? What Accept Language do I have to have configured to correctly view www.café.com? I haven’t covered this further because it’s secondary to the even bigger problem mentioned above.