An Emojii IDN

The great thing about IDNs (Internationalized domain names) is that you can pretty much use any Unicode character within a domain, the problem is trying to find a TLD (top level domain) registrars that will allow it. At the time of writing this only .tk and .xyz support it.

Created by @pmrourkie.

What is this?

An Internationalized Domain Name (IDN) is a domain, or web address, that contains at least one character in a local language script such as Arabic, Chinese, Cyrillic or Tamil - for exmaple: 名がドメイン.com and 中文网.中文网.

The 'Internet' was developed by English speaking people, who use Latin script and thus encoded the underlying 'Internet' in what's called ASCII - this contains 128 Characters such as A-Z, a-z and 0-9. Traditionally domain names could only contain ASCII characters. As the 'Internet' expanded it was important to support Non-Latin based language scripts, like Afaka, Arabic, Bengali, Cyrillic and Chinese. A clear standard called Unicode was developed which supports over 110,000 characters and around 100 language scripts.

Within Unicode there is support for thousands of symbols:

❤ ♎ ☀ ★ ☂ ♞ ☯ ☭ ☢ € ☎ ⚑ ❄ ♫ ✂

IDN was originally proposed back in December 1996 by a guy called Martin Dürst and was implemented globally in 1998. Then, in 2009 ICANN approved the creation of Internationalized Country Code Top-Level Domains, or IDN ccTLDs, with the first few (.рф,.中國) being launched in 2010. To date there are around 67 of IDN ccTLDs.

What is

So, you probably typed in ☺.xyz, or clicked a link, into your browser but your browser is probably displaying - This is called Punycode. The Domain Name System (DNS) doesn't support Unicode characters, only ASCII, so a system was created to ensure that IDN's could be supported, this is called Punycode.

Punycode works by taking the unicode domain name and encoding it into an ASCII readable format. So, for an example we'll use mü The "ü" isn't an ASCII character so when you hit enter on your browser it converts it into an ASCII format to do the DNS look-up: All punycode domains start with XN--.

So why is my browser displaying

Well, as with everything on the Internet...Spam, Phishing and Malware! With the roll out of IDN’s, there has been a large rise in Malware and Phishing attacks, called IDN homograph attacks, by taking advantage of Unicode. This is where a website such as: - which is displayed in Latin characters - is written in a way that a person may not notice by exploiting the face many characters look alike: riс In this example I have swapped the Latin "c" in my name with the Cyrillic "c". This is also known as Script Spoofing - for this reason, most browsers display the IDN in punycode. On a side note, also because of this most registrars will prevent you from purchasing multi script domain names like riс

This really isn't an extensive explanation of Unicode, Punycode, IDNss, IDN ccTLDs or DNS but if you're interested below are some links for further reading:

This was built for demonstration purposes, and a bit of fun, by @pmrourkie. Feel free to tweet me if you think anything is misrepresented or if you want to add additional information.