July 07, 2005

IDN Update (2)

Mozilla Foundation products now only display IDNs in a whitelist of TLDs, which have policies stating what characters are permitted, and procedures for making sure that no homographic domains are registered to two different entities. This work was done in bug 286534 by Jungshik Shin - thanks to him. I've made a list of the whitelisted TLDs with links to their registries and the relevant published policy documents. Our whitelist is currently (almost; see below) identical to that of Opera; I'd also like to thank Yngve Pettersen of Opera for saving me a lot of work in gathering this data.

(Note: .museum and .hu have been approved but we have not yet checked in the change to enable them. This is bug 299927.)

Posted by gerv at July 7, 2005 11:55 AM
Comments

What about the *.pl domain?
http://www.dns.pl/IDN/allowed_character_sets.pdf

In short - only Latin, Greek, Hebrew and Cyryllic sets are allowd, but you can only use one set in one domain.

Posted by: marcoos at July 7, 2005 02:28 PM

marcoos: Cyrillic and Greek both have certain letters which are homographic with Latin. Restricting one set to a domain is not enough - for example, caxap.pl (caxap means "sugar" in Russian) would be homographic with the same version with Cyrillic characters.

Therefore in order to turn IDN on for .pl, the .pl registry would need to publish a policy stating how it prevented two different people getting hold of the two versions of caxap.pl. There are several ways they could do this - for example, bundling (making the registrant register all of the versions at once) or blocking (once one variant is registered, not allowing registration of any of the others). We don't think we should be mandating how they do it - it's up to them what method they choose. But they do need to do it. Otherwise, the owner of one caxap.pl would be at risk from spoofing from the owner of the other one.

Posted by: Gerv at July 7, 2005 03:13 PM

What about *.gr?
From: https://grweb.ics.forth.gr/english/gr_char_en.html

«A feature of the registration procedure of these domains will be the concept of "Bundle". Bundled domains are going to be the domains that differentiate only in punctuation but are otherwise identical to a main form. Any domain that is a Homograph of the original registration may also become part of the Bundle.

Domain names that could be registered in each bundle are not automatically registered for the registrant but are instead excluded from the list of available domain names until this particular registrant decides to "Activate" one or more of them. Each activation has a cost, depending on the Registrar the registrant decides to use.»

Posted by: Filipp0s at July 8, 2005 08:17 AM

From September 1 this year, it will possible to register domain names containing the local characters U+00E4 (ä), U+00E5 (å) and U+00F6 (ö) for the .fi TLD.

Will .fi then be added to this whitelist?

http://www.ficora.fi/englanti/internet/IDN.htm

Posted by: B-J at July 9, 2005 12:03 PM

.gr has been approved, after I got an email from them.

B-J: if they ask me, like it says in the document! :-)

Posted by: Gerv at July 12, 2005 03:14 PM