IDN Policy Clarifications

There are some issues that I didn’t make clear enough in my last IDN policy post.

  1. Turning off IDN is a temporary measure for Firefox 1.0.1, Mozilla 1.7.6 and Mozilla 1.8 beta, all of which are due for release in the next couple of weeks (this week, ideally), to target this and other security issues. I personally am confident that this will not be the long term solution. We hope to engineer the disablement in such a way that we can later return the pref to whatever value the user had it set to before we started.
  2. I am very keen on the long-term solution being a TLD whitelist or blacklist. Whether this is feasible, and which of the two we have depends on the scope of the homograph problem, which we are still trying to determine – and that’s something we need your help with. This solution would avoid penalising registrars/registries who are doing the right thing, while protecting our users. The main reason this is not being implemented for 1.0.1 is the timescale.
  3. We have considered a lot of other solutions also – warning bars, icons, tooltips, you name it. Such things have the disadvantage of needing coding and testing (so can’t be quick) but also usually have the disadvantage of discriminating against IDN as a class of domains, which we do not want to do. Please read my thoughts on “Solution Requirements” in my “Phishing – Browser-based Defences” paper for my views on what solutions are and are not acceptable for phishing problems.
  4. We are not being Anglocentric here – or rather, we are not being more Anglocentric than the DNS system has been for the last 30 years.
  5. As a general point, we understand that preserving our reputation for security may require inconveniencing users some of the time. While each case is evaluated on its own merits, people should not be surprised if you see other things temporarily disabled in future.

Now, to talk about what other people are saying:

Paul Hoffman may know more about IDN and Unicode than I do, but as someone who has done a lot of thinking about browser UI, I can tell him that an “IDN explanation popup” will not be read. Also, merely indicating whether a domain name is IDN or not is useless unless there’s some inference that a user can draw from that. “It’s suspicious” is presumably not the inference he wants people to draw. And if one IDN domain spoofs another one, the icon will be similarly present on both.

He and others (including some advice issued by the IDN community) have also suggested that you can solve the problem by indicating the class of each character by changing the background colour. Leaving aside the problems this causes for the 5% of the population who are colour blind, this will merely make the location bar ugly. Users will either complain loudly about the ugliness and ask to turn it off or, if reconciled to it, either get used to the dancing array of colours behind the domains they visit (non-English users) or treat any change in colour as suspicious (English users). Both are bad outcomes, and neither improves security.

He can rest assured that we neither want to turn IDN off (permanently), nor do we want to make IDN use obnoxious. We would like IDN domains to work as beautifully and simply as non-IDN domains, and that’s the goal we are shooting for.

35 thoughts on “IDN Policy Clarifications

  1. You only need to have bizarre address URL colors if you have MIXED idn classes KNOWN TO LIE IN THE SET OF HOMOGRAPH CONFUSIONS on SECURE SITES, which is where the problem is. Surely it’s not too much of an inconvenience to have this UE feature on such a narrow subset of sites?

  2. What I don’t understand is, why can’t we simply block all characters which look like the normal ascii chars. I mean, the paypal exmaple obviously showed a faked ‘a’. It must have a special UTF value. So block it. basta.

  3. Sorry for duplicate comment posting; I just saw the new entry.

    I hope the long-term solution will be something similar to the warning box given for “http://www.paypal.com@evilsite.net” type spoof attempts. FF�s behavior here is a good compromise between correctly implementing a useful-but-abusable standard and protecting the user. It would be even better if the dialog box gave a third option: to whitelist the site for URL login names.

    Similarly, the warning box for a suspected homograph spoof should explain the problem in simple language, ask for confirmation, and allow for whitelisting of the domain. (Whitelisting an entire TLD could be an option found somewhere in the advanced preferences, but probably doesn�t belong in the warning dialog.) A related preference could specify which characters in the domain name would trigger the warning � The most paranoid setting could include even ASCII “1” and “0”, which in some fonts can fool the exceptionally gullible. A more generous setting would only trigger the box if the character combination “looked” suspicious, with the suspicion code subject to updates of course, to enable fast protection from future spoof techniques.

    It might be best to combine the login spoof detecting and homograph spoof detection into a single configurable URL spoof detection feature, to keep the interface simple.

  4. I can’t help but think that in the long run something like gerv’s proposed glyph system will be necessary.

    The current model presumes that the physical appearance of a URL is what the user should trust. But this doesn’t appear to work particularly well; it seems pretty easy to trick users even without throwing international characters into it.

    But I guess any change would require industry wide cooperation and massive “user reeducation.” I’m not holding my breath. :)

  5. Just out of interest, is there a good reason for not displaying the organisation name from the SSL certificate in the main chrome?

  6. I read a proposal from somebody that the status bar color should change based on a hash of the domain name. I thought that was a very clever proposal (even though color-blind users don’t get the full advantage) because (at least when you’re visiting something like paypal or an online banking site) you’d get to know the status bar color for that site pretty well, and a change in that color would be an immediate indication that something strange was going on.

  7. I think we should consider a tld white/blacklisting as a quick fix. Many good registrars (like .de, .at, .ch) allow only a small set of IDN characters that don’t allow such spoofing attacks.

    Disabling IDN completely in highly distributed builds (FF 1.0.1, SeaMonkey 1.7.6) a real hard thing to swallow for people who have registered such domains (like me). But I guess even a basic tld white/blacklisting would be too much coding work such a short time before releasing those versions. I don’t really like the taste of telling people “my perfectly legal and nice domain only does work in old, unsecure versions”. I’m glad I don’t have customers yet that might print their IDN subdomain in big letters on billboards. This way it’s just my own wasted money.

  8. “””Leaving aside the problems this causes for the 5% of the population who are colour blind, this will merely make the location bar ugly.”””

    Not uglier than the current, already implemented yellow background that appears on https: secured websites. The color code should only show on URLs with international characters, and thus it could easily difference legitimate sites from spoofed ones.

  9. After reading “Phishing – Browser Based Defenses” a thought occurred to me: it is geared toward warning the user of the phishing attempt as soon as he lands on the page. But the notion of the browser “getting suspicious” about a site in my mind should be carried along: to be acted upon, for example, when the user submits a form. I bet most users disable the submit warning, so having it come back in a different color, form and shape, with the OK button disabled for thirty seconds and similar measures might alert them to the fact that the browser has entered suspicious mode.

  10. You only need to have bizarre address URL colors if you have MIXED idn classes KNOWN TO LIE IN THE SET OF HOMOGRAPH CONFUSIONS on SECURE SITES, which is where the problem is.

    Well, that’s not what’s currently being proposed. And if you knew that you had mixed classes on a secure site in a suspect TLD, you should be doing more than changing a background colour.

    What I don’t understand is, why can’t we simply block all characters which look like the normal ascii chars.

    Because Russians and others, whose characters they are, might be somewhat annoyed. And it’s not just an ASCII problem anyway – some IDNs can spoof other IDNs.

    Just out of interest, is there a good reason for not displaying the organisation name from the SSL certificate in the main chrome?

    There is certainly a good reason for not just turning it on without thought. Defending against this issue while keeping the amount of UI a user has to consider to a minimum is a very tricky thing. You would need to present a solid cost-benefit analysis of why turning it on would prevent more attacks than the current display of the domain name.

    I read a proposal from somebody that the status bar color should change based on a hash of the domain name.

    That was a proposal I made :-) People didn’t like it because it didn’t help the colourblind as much as everyone else. I still think it has value.

    I’m amazed I haven’t seen more interest in the Netcraft toolbar, at least from MoFo types

    The Netcraft toolbar has some good ideas – but to implement many of its features, it requires sending every domain accessed to a central server. And many people don’t like that for privacy reasons. I’ve incorporated some ideas from it into my paper “Phishing – Browser-based Defences“.

    I’m glad I don’t have customers yet that might print their IDN subdomain in big letters on billboards.

    We’re glad you don’t, too. It’s great that this issue is being dealt with before IDN becomes even more widespread.

    Not uglier than the current, already implemented yellow background that appears on https: secured websites. The color code should only show on URLs with international characters, and thus it could easily difference legitimate sites from spoofed ones.

    Why do you assume all URLs with international characters are spoofed? If that was the case, we’d just disable IDN. However, we want to get to a position where valid IDN domains are first-class citizens, so that people who use non-Latin character sets can have domains in their letters, rather than ours.

    Also, a plain yellow background is far less ugly than one that’s pink, yellow and blue in different randomly-sized blocks.

    Davide: my idea is that if a browser is suspicious of a site, it should pop an information bar and disable all form controls. The information bar would need to be actively dismissed before form submission is possible.

  11. I find this situation sort of similar to the one of the problems of SPAM-detection: address forgery.

    No way one can stop all possible spoofings. Some more serious mechanism for page origin and authenticity required.

    Honestly, as my heart feels, the only long term solution is education. When I was kid, my parents has taught me to avoid some places, to distrust some things. It is sort of hard to make it fool-proof.

    P.S. Some may argue to go with SSL. Ok. If I would register certificate to “payppal.us” – how many users will still think that this is true PayPal?
    And I knew some real life example when two companies were getting messed up because of similar names and close office locations.
    What if someone will try to abuse that? How can we stop him? Dunno… Real life experience is telling me “think first” thingy.

  12. Why not just do a DNS query to see if there are other sites that would render as the same spelling, and then ask the user to confirm the site?(this still doesn’t solve IDN vrs IDN spoofs, tho, but it certainly solves US spoofs.)

    also, what about displaying two addresses? a “escaped” url, and a ‘normal’ url, when there are mixed charsets?

  13. If a warning message could take into account the languages I know first, and make a more educated guess about whether I may or may not be confused by an IDN in the address bar, it may be easier to swallow.

    It seems to me that the languages I read reasonably well I will add to the “Languages” preferences of my browser. If I speak, say, English, Greek, and Japanese, I will probably always want to see unmolested IDNs in my address bar for sites in those languages. If I’m paranoid, I set a warning for all glyphs in languages I don’t read, and white list any I actually visit.

    After that, sites using glyphs that all belong to one language I would assume to be legitimate. Sites that use glyphs from multiple languages MAY be legitimate, but if I were paranoid, I’d assume the worst first, and white list them later (e.g.: a theoretical ‘toys-Я-us.com’ may be legit, while ‘pаypal.com’–using the Cyrillic letter for the first ‘a’–of course, isn’t).

  14. It’s worth considering that homographic display is probably on some fundamental level, *wrong*. Human beings don’t download character numerics directly into their brains. The shape of the character is the way we perceive its value.

    I don’t pretend to understand how we’ve gotten to this point. I’m assuming that your choice of a character set can result in homographic character sequences. Perhaps the simple answer is that it shouldn’t. If we’re going to have IDNs, don’t we need to have better rendering on everybody’s browsers?

  15. philips said: Why not just do a DNS query to see if there are other sites that would render as the same spelling, and then ask the user to confirm the site?(this still doesn’t solve IDN vrs IDN spoofs, tho, but it certainly solves US spoofs.)

    For exactly the reason you mention, and because it’s discriminatory against IDN domains.

    newdok said: also, what about displaying two addresses? a “escaped” url, and a ‘normal’ url, when there are mixed charsets?

    Because it’s ugly, and it would be very hard to educate users to know exactly what it means and what to look for.

    Bucky said: It seems to me that the languages I read reasonably well I will add to the “Languages” preferences of my browser. If I speak, say, English, Greek, and Japanese, I will probably always want to see unmolested IDNs in my address bar for sites in those languages. If I’m paranoid, I set a warning for all glyphs in languages I don’t read, and white list any I actually visit.

    Going down this route leads to a balkanized web, where no-one wants to venture out of their own little language-specific area for fear of being spoofed, and because the browser says “warning! warning!” whenever they try it.

    Rob said: I’m assuming that your choice of a character set can result in homographic character sequences. Perhaps the simple answer is that it shouldn’t. If we’re going to have IDNs, don’t we need to have better rendering on everybody’s browsers?

    The incidence of homographs is not related to your choice of character set, and providing better rendering doesn’t distinguish letters which are supposed to look alike.

  16. Das aus f�r IDN?

    Die Unterst�tzung f�r IDN (Umlautdomains) wird wohl aus Mozilla entfernt. Ich halte das f�r einen verdammt schlechten Scherz Schlie�lich wurden die Domains als die Zukunft gepriesen. Und Deutsche trifft es ja "nur" mit den Umlauten. Die Chinesen

  17. newdok said: also, what about displaying two addresses? a “escaped” url, and a ‘normal’ url, when there are mixed charsets?

    gerv said: Because it’s ugly, and it would be very hard to educate users to know exactly what it means and what to look for.

    For secure sites… they already have the domain name printed next to the padlock symbol, it wouldn’t look too bad if an IDN was printed as “www.paypal.com (www.xn--pypal-4ve.com) [Padlock]”. That would make users aware of the issue, and could potentially be linked to some sort of “what does this mean?” dialog or whitelisting (“don’t show this again for the current domain”) functionality.

  18. Just by the way, I posted bug 282316 which is essentially an RFE for Gerv’s system that tracks whether you’ve visited secure sites before (although obviously it’s not identical to his article as I had a few other ideas/different opinions as to how such a system might work in practice).

    I think that registries not living up to their responsibilities (the proposed whitelist/blacklist should deal with this) is the only problem affecting IDN specifically. All other phishing vulnerabilities can apply pretty much as easily with domains that are plain ASCII (paypa1.com) or that aren’t visually confusable with the real domain (paypal-secure.com, paypal.secure24-7.com, etc) since many of the users who are naive enough to be fooled by a phishing email are likely also naive enough not to check the address bar when they get there. If we’re really, really lucky they will look for the padlock and yellow bar, although frankly I’m not sure we can even expect that…

    –sam

  19. Just out of interest, is there a good reason for not displaying the organisation name from the SSL certificate in the main chrome?

    There is certainly a good reason for not just turning it on without thought. Defending against this issue while keeping the amount of UI a user has to consider to a minimum is a very tricky thing. You would need to present a solid cost-benefit analysis of why turning it on would prevent more attacks than the current display of the domain name.

    I’m not sure exactly what the cost/benefit analysis is. However the line of thinking is that the one entity that the user is supposed to trust is the certificate issuing authority, not the people issuing domain names. Therefore it seems reasonable to use information from the certificate to protect the user rather than just concentrating on the domain name. However I’m not familiar enough with the process of obtaining certificates to know what should work. Is the organisation name supposed to be unique and verified? Is the certificate fingerprint unique for all time? Could one use some of this information to implement a security code (implemented, perhaps as a set of dingbats glyphs, per your other suggestion) unique for a given site? Just because the problem is domain name spoofing doesn’t mean that the answer, necessarily has to be based around the domain name alone.

  20. Not only would Paul Hoffman’s “IDN explanation popup” not be read, it also would not solve the problem – at least not if IDNs can be expected to find deployment outside the spoofer community: Consider a legitimate domain that uses non-ASCII characters; say, Cyrillic. A spoof domain could be set up, substituting a similar Greek character for one of the Cyrillic characters.

    Few people would be able to tell the difference from looking at the domain name when it is shown using characters from the respective scripts. And few people would be able to tell the difference from looking at the corresponding ASCII strings xn--garblegarblegarble – decoding Punycode mentally takes some practice ;-)

    A solution that might work would be to show three representations, including one that presents the full Unicode names of the non-ASCII characters (i.e., “bm[LATIN SMALL LETTER O WITH DIAERESIS]ller” in addition to “bm�ller” and “xn--bmller-xxa”). This is unique and readable (but very long for non-Latin names).

    However, there is a deeper problem behind the TLS/SSL spoofing attacks. Browsers today rely too much on certificates. They should consider certificates merely as hints for authenticating previously unknown servers, and like SSH should cache each server’s certificate after the first visit. There should be some warning for every new TLS/SSL site (and a more insistent warning once any site’s identification changes – do you really want to trust every company that owns one of your browser’s root certificates?). Then people will get used to not seeing this warning on their frequently visited TLS/SSL servers, but seeing it on random servers elsewhere; and if a warning pops up when they supposedly are going to one of their usual trusted servers, then this warning might alarm them that something evil may be going on.

  21. Paul: it appears my gerv.net domain wasn’t renewed! I’ve renewed it; please try again now.

    I’ll be having words with my ISP.

  22. OK, why not consider blacklisting resistrars that do not do adequate checks of IDN names when they are registered.

    If a domain is registered through a ‘problem’ registrar and contains potential problem characters issue a warning to the user.

    This would have the effect of pushing registration business to registrars that behave correctly.

  23. Hi – I like the article “Phishing – Browser-based Defences”. Two comments, though.

    First, I think it sells the colorizing of letters
    a little short. There are millions of users who would
    NEVER visit a DNS site name that had mixed scripts.
    Obviously, the millions of people who only read & write
    English are unlikely to WANT to visit a site that uses
    non-English letters, since such sites tend to not use English (!).
    Only colorizing when they’re mixed, and letting people
    turn it off if it’s “ugly”, would help millions of people.
    Yes, the colorblind & some users will not be helped, but
    if you help the majority, it’s unlikely phishers will perform
    the attack. A phishing attack that only works against
    the colorblind is less likely to be attempted.

    Second, though I like the glyph idea, I think more glyphs than 2 are needed. A phisher
    could create a program that randomly morphs a
    domain name among many different ways, trying to
    find a good substitution, and then hashing the result
    to see if the glyphs match.

    The way to figure out the number of glyphs
    is to imagine a program that can create a large number of
    “phish food” domain names from a given name
    (substitute l for 1, substitute O for 0, do both, …),
    and see how many alternatives you can find for a
    given name. Here’s a back-of-the-envelope calculation;
    say a phished domain name has no more
    than 10 characters DNS chararacters in the name, and on
    average each character can be reasonably substituted
    with 3 other characters. (These are guessed numbers,
    but it should be possible to figure out REAL values
    from these using common phished domains like
    paypal.com, ebay.com, etc.). That means that the set of
    alternatives is (3+1)^10-1, i.e., 1,048,575 alternatives –
    about one million. A two-char glyph only gives
    64^2 = 4,096 hashes, so a phisher is almost
    certain to find several alternatives with the same visual hash.
    Four glyphs gives you 64^4=16,777,216.. a phisher
    only has approximately 1/16 chance of finding a match.
    Five glyphs gives you 1,073,741,824… the phisher
    has around a 0.1% chance of finding a match.
    Shorter domain names are even harder to forge.

    — David A. Wheeler

  24. What about shoving more of the information from the certificate when on HTTPS ? With the large screens people have nowadays, there should be room to always show name and address of the certificate holder. One must assume those informations are checked by the CAs and there are correct as shown.
    I would suggest making room between the tabs and the the page. When people look at the address bar they can’t help looking there too. An address in Russia would look really odd when you expect to be on paypal.com! A pop-up wont work as you will just close it again without reading it.

  25. There are millions of users who would NEVER visit a DNS site name that had mixed scripts. Obviously, the millions of people who only read & write English are unlikely to WANT to visit a site that uses non-English letters, since such sites tend to not use English (!).

    Sorry, this is ill-informed and just wrong. I know I have come across a very large range of sites with at least partially English content (small software projects with homepages in both German and English, f.ex) in the .de TLD.

    Any �protection� proposal that tries to discriminate against IDNs purely on the fact that they’re IDNs is bound to be useless in the long run.

  26. Why not have, by default, different fonts for different types of characters?

    I’m using Bitstream Vera Sans for my UI, and when I tried that paypal site I noticed the first a was different, not sure if it is because the font itself doesn’t have that character or because it has different a’s. But I had a visual warning.

    Anyway, with the current setup FireFox uses the font set in the OS for the adress bar. If, instead, it would use the fonts in the fonts settings for Firefox, and by default ‘Western’ and ‘Unicode’ were set to be visually different fonts, and finnaly when the user tried to set up the same font for both ‘Western’ and ‘Unicode’ a warning would pop up about the danger.

    Then we would have some visual warning equivalent to payppal.com passing has paypal.com. Which means, it’s not perfect, sometimes people would miss them, but it would be has easy to spot as payppal.

  27. my two cents

    microsoft’s decision to ‘blow smoke’ in people’s eyes by prematurely releasing a beta of IE7
    is partly (largely?) due to IDN support,
    maybe this is old news but one of microsoft’s ‘secret’ new features of IE7 is support for IDN/multilingual domains. they had originally planned to release IE7 along with longhorn sometime late 2006.

    but 400milion+ people in asia are switching to firefox at a fast rate (because they are using a lot of idn’s there) users that might not be so easy to convert back to IE users, and to stop this
    there’s an IE beta with idn support on it’s way (unless firefox/moz disables idn of course..)

    I’m saying maybe idn support is one of the big migrating factors out there, so don’t disable it

    as far as solutions go, I saw this https://bugzilla.mozilla.org/attachment.cgi?id=173729
    which seems resonable enough, all other talk about special treatment for idn’s like special colors or popup-info boxes with ‘this is an idn’ (implying it’s a less secure domain, or a subset etc) just seems silly,

    firefox/moz can either wait for the IE7 beta to see how they do it, or lead and do it their own way (don’t have to be perfect right away)

  28. * Can’t discriminate against IDN in any way.
    * Can’t play with colour.
    * Can’t trigger on multiple scripts, because a domain name with language combinations may be quite valid.

    First there should be made a mapping between all interchangeable characters like Latin/Cyrillic a, i, p, hypens, etc. (Unicode spec will help do this automagically and help keep it updated in the future)

    Then whenever an IDN site gets visited, look though History using these mappings to check if user visits any similar-looking domain and warn if there’s a match. Current default 9 day history is not that helpful, and going through it will not be efficient. Would need to add a long-term (a year?) domain-only history. Add to it only sites that do not trigger a warning, or which were confirmed to be ok and don’t make it user editable. (Otherwise if someone intentionally removes the original site from history, and puts phishing site into the history, then warning would be triggered on the original site and not for phishing site.)

    This is the only kind of protection I can think of that would not be discriminative or annoying.
    Otherwise the browser would need to know what sites are “popular” to detect phishing against them, and I think that’s an ugly and messy solution with black/white lists.

    Also would need to think about performance and scalability issues. How many domains can a user visit in a year? How many domains a browser installation (in Internet cafes for example) can visit in a year?

    The reasoning of why this would be effective is simple.

    Phishing makes sense only on sites that user visits and likely to submit personal info to. Otherwise it makes no difference to visiting any other new site.

  29. Yes M!
    I am the one who changed the browser to firefox from internet explorer to use IDN.

    If fire fox does not support IDN, no choice but I go bank internet explorer with I-nave plug-in or new version or opera which support IDN natively.

  30. I agree, I changed to Firefox for the IDN support.
    Don’t you get it? the media blows up the Firefox�s vulnerability for homograph attacks and then they say than fortunately if you use IE you’ll be fine, (suspicious eh?) then within days Bill Gates out of the blue announces the release of IE7.
    isn’t that fortunate coincidence for microsoft or what?
    I guess they have achieved what they wanted, they have turned, all of us IDN users, second class netcitizens, by displaying the punycode instead of our native language domain name.
    thanks Firefox for falling on the trap.

  31. Flame for IDN comments

    I think people somehow misunderstood my previous entry about disabling IDN support in FireFox. It was written right after Mozilla announced they would disable default support for IDN, and before they changed that to just display IDN domains as punycode.