It has become important in recent years for web browsers to know something about the de facto ‘shape’ of the DNS – e.g. to tell the difference between co.com (someone’s domain) and co.uk (a registry-specified suffix under which people register domains). This is used to stop cookie leakage between domains, to highlight the important parts of a domain name, and for other things too.
To do this, Mozilla started the Public Suffix List project, a cross-browser initiative which tries to maintain such a map. This list is used by Opera and Chrome/Chromium. Thanks to some heavy lifting at the start of the project by some very hard-working volunteers, the list is pretty comprehensive (although we tweak it regularly).
IE 8 also needs to know this type of information, to power things like its domain highlighting in the URL bar. The excellent Eric Lawrence’s post on the IE blog details what they use it for and how their code works. You can see there the algorithm that IE used in all versions prior to IE 8.
In IE 8, they made changes to improve the accuracy of the algorithm. Sadly, although the licensing on the data is designed to enable them to, they have chosen not to switch to using the Public Suffix List. Instead, they have kept their old heuristic but added a set of exceptions – ietldlist.xml, which is bundled with IE 8. (If you have IE 8, you can see it by visiting the URL res://urlmon.dll/ietldlist.xml).
This is sad a) because it makes the browsers inconsistent with one another and b) because IMO their algorithm and list combination does not produce results as good as the Public Suffix List. Here are some issues:
- The IE list contains typos (I’m fairly sure about most of these):
- aeroport.ci (aéroport.ci)
- ciesqyn.pl (cieszyn.pl)
- golgow.pl (glogow.pl)
- udmautia.ru (udmurtia.ru)
- prindipe.st (principe.st)
- edunte.tn (edunet.tn)
- cherrnigov.ua (chernigov.ua)
- The .aero, .pro and .museum gTLDs have a large number of reserved subdomains; these aren’t recognised.
- There is likewise no attempt to deal with the subdivided complexities of Italy (.it), Japan (.jp) and Norway (.no).
That’s not to say we don’t have things to look into either; I’ve filed a bug to follow up the places where IE has an entry that we don’t.
I’ve written a Perl script implementing both algorithms (PSL courtesy of the regdom-libs project) so people can see the differences for a particular domain. Note that I can’t redistribute ietldlist.xml, so you’ll need to obtain your own copy of that before the script will run.
I hope Microsoft will consider using the PSL for the next release of IE, so we gain cross-browser consistency and can all work together to maintain a single map of the DNS. We are happy to work with them to make that possible.