IE 8 and the Public Suffix List

It has become important in recent years for web browsers to know something about the de facto ‘shape’ of the DNS – e.g. to tell the difference between co.com (someone’s domain) and co.uk (a registry-specified suffix under which people register domains). This is used to stop cookie leakage between domains, to highlight the important parts of a domain name, and for other things too.

To do this, Mozilla started the Public Suffix List project, a cross-browser initiative which tries to maintain such a map. This list is used by Opera and Chrome/Chromium. Thanks to some heavy lifting at the start of the project by some very hard-working volunteers, the list is pretty comprehensive (although we tweak it regularly).

IE 8 also needs to know this type of information, to power things like its domain highlighting in the URL bar. The excellent Eric Lawrence’s post on the IE blog details what they use it for and how their code works. You can see there the algorithm that IE used in all versions prior to IE 8.

In IE 8, they made changes to improve the accuracy of the algorithm. Sadly, although the licensing on the data is designed to enable them to, they have chosen not to switch to using the Public Suffix List. Instead, they have kept their old heuristic but added a set of exceptions – ietldlist.xml, which is bundled with IE 8. (If you have IE 8, you can see it by visiting the URL res://urlmon.dll/ietldlist.xml).

This is sad a) because it makes the browsers inconsistent with one another and b) because IMO their algorithm and list combination does not produce results as good as the Public Suffix List. Here are some issues:

  • The IE list contains typos (I’m fairly sure about most of these):
    • aeroport.ci (aĆ©roport.ci)
    • ciesqyn.pl (cieszyn.pl)
    • golgow.pl (glogow.pl)
    • udmautia.ru (udmurtia.ru)
    • prindipe.st (principe.st)
    • edunte.tn (edunet.tn)
    • cherrnigov.ua (chernigov.ua)
  • The .aero, .pro and .museum gTLDs have a large number of reserved subdomains; these aren’t recognised.
  • There is likewise no attempt to deal with the subdivided complexities of Italy (.it), Japan (.jp) and Norway (.no).

That’s not to say we don’t have things to look into either; I’ve filed a bug to follow up the places where IE has an entry that we don’t.

I’ve written a Perl script implementing both algorithms (PSL courtesy of the regdom-libs project) so people can see the differences for a particular domain. Note that I can’t redistribute ietldlist.xml, so you’ll need to obtain your own copy of that before the script will run.

I hope Microsoft will consider using the PSL for the next release of IE, so we gain cross-browser consistency and can all work together to maintain a single map of the DNS. We are happy to work with them to make that possible.

4 thoughts on “IE 8 and the Public Suffix List

  1. Most of your points are valid, but I really can’t see why a browser would need to know about theoretical reserved subdomains in TLDs that nobody uses anyway. Museums generally are in .com or .org, and as for .aero and .pro, I don’t even know what they’re ostensibly for. I’ve never previously heard of them.

    Until there are actual websites in those TLDs (and I mean websites set up by someone *other* than the people promoting the TLD), it seems premature to expend any significant effort cataloging a large number of reserved subdomains that in all probability will never actually be used.

    I’d be much more interested in knowing how it handles geographic-TLD hierarchies with inconsistent organizational schemes (e.g., http://www.ci.galion.oh.us versus http://www.galion.lib.oh.us).

  2. You can’t tell whether people are actually using it without lots of fiddling around in the DNS. Much easier just to list it.

    As for finding out what it does with odd hierarchies, you can use my code to see :-)

  3. > You can’t tell whether people are actually using it without
    > lots of fiddling around in the DNS. Much easier just to list it.

    That’s all well and good, but you weren’t just deciding how to do it for Mozilla. You were criticizing IE and the way it does things. I wouldn’t have any problem with criticizing IE if there were substance to the criticism, and some of your other points do have substance, but this particular point seems vacuous to me. Complaining that they don’t correctly handle obscure subdomains in .thinggummy, if you don’t know of anyone who actually uses those subdomains, feels disingenuous and empty. Including hollow criticisms like that weakens your argument.