Punycode Punycode

Punycode - Definition and Overview

Unicode
series
Unicode
UCS
UTF-7
UTF-8
UTF-16
UTF-32
SCSU
Punycode
Bi-directional text
BOM
Han unification
Unicode and HTML

Punycode, defined in RFC 3492, is a self-proclaimed "Bootstring encoding" of Unicode strings into the limited character set supported by the Domain Name System. The encoding is used as part of IDNA, which is a system enabling the use of internationalized domain names in all languages supported by Unicode, where the burden of translation lies entirely with the user application (e.g., web browser).

The encoding is applied separately to each component of a domain name which is not representable solely within the ASCII charcter set, and a reserved prefix 'xn--' is added to the translated Punycode string. For example, bücher becomes bcher-kva in Punycode, and therefore the domain name bücher.ch would be represented as xn--bcher-kva.ch in IDNA.

Compare an ASCII 'punycoded' URL http://xn--tdali-d8a8w.lv/ (http://xn--tdali-d8a8w.lv/) (working) and its full Unicode counterpart that does include Latvian characters with appropriate diacritics: http://tūdaliņ.lv (http://t%C5%ABdali%C5%86.lv) (not working because this Wikipedia page is not in Unicode; instead, its character set is iso-8859-1, which cannot correctly render URLs containing internationalized domain names).

Google is able to search within the 'punycoded' sites; the query string to enter is, e.g. site:tūdaliņ.lv (http://www.google.com/search?hl=en&lr=&safe=off&c2coff=1&q=site%3A.t%C5%ABdali%C5%86.lv&btnG=Search).

Punycode is designed to work across all script systems, and to be self-optimising by attempting to adapt to the character set ranges within the string as it operates. It is optimised for the case where the string is composed of zero or more ASCII characters and in addition characters from only one other script system, but will cope with any arbitrary Unicode string. Note that for DNS use, the domain name string is assumed to have been normalised using Nameprep and (for top-level domains) filtered against an officially registered language table before being Punycoded, and that the DNS protocol sets limits on the acceptable lengths of the output Punycode string.

Contents

Phishing concerns

Because Punycode allows websites to use full Unicode names, IDNA could leave their users open to phishing attacks. IDNA makes it possible to create a spoofed web site that looks exactly like another, including domain name and security certificate, but in fact is controlled by someone attempting to steal private information.

In Unicode, different characters can look the same. For example, Unicode character U+0430, Cyrillic small letter a (а), looks identical to Unicode character U+0061, Latin small letter a, which is the a used in English. Although the browser may display identical glyphs for each character, it uses differing representations (in plain text, Unicode or Punycode) when locating the web sites or validating certificates. As a result, someone could register a domain name that appears identical to an existing domain but goes somewhere else. For example, the spoofed domain "pаypal.com" contains a Cyrillic a, not a Latin a.

On 7 February 2005, Slashdot reported that this exploit was disclosed at the hacker conference Schmoocon with an example available at http://www.shmoo.com/idn/. On browsers supporting IDNA, the URL "https://www.pаypal.com/" appears to lead to paypal.com but instead leads to a spoofed PayPal web site that says "Meeow." Mozilla Firefox, which supports IDNA, shows the page as being at the paypal.com and with a verified security certificate. Firefox displays no warnings of any sort.

DNS Registries known to have adopted Punycode

See also

External links

Example Usage of Punycode

jillesdotcom: FYI: π.cc is a Punycode domain and translates to xn--1xa.cc - http://en.wikipedia.org/wiki/Punycode
nicidienase: @Punycode try the office in HMO (Tel: 0721-608 - 19164) bildungsstreik.ka@googlemail.com
punycode: #kitbrennt #entropia Does anybody have contact info for AK Utopia? Would like to discuss some issues on a presentation we wanted to prepare.
Copyright 2009 WordIQ.com - Privacy Policy  :: Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the this Wikipedia article.