☺
An Emojii IDN
The great thing about IDNs (Internationalized domain names) is that you can pretty much use any Unicode character within a domain, the problem is trying to find a TLD (top level domain) registrars that will allow it. At the time of writing this only .tk
and .xyz
support it.
Created by @pmrourkie.
What is this?
An Internationalized Domain Name (IDN) is a domain, or web address, that contains at least one character in a local language script such as Arabic, Chinese, Cyrillic or Tamil - for exmaple:
名がドメイン.com
and 中文网.中文网
.
The 'Internet' was developed by English speaking people, who use Latin script and thus encoded the underlying 'Internet' in what's called ASCII - this contains 128 Characters such as A-Z, a-z and 0-9. Traditionally domain names could only contain ASCII characters. As the 'Internet' expanded it was important to support Non-Latin based language scripts, like Afaka, Arabic, Bengali, Cyrillic and Chinese. A clear standard called Unicode was developed which supports over 110,000 characters and around 100 language scripts.
Within Unicode there is support for thousands of symbols:
❤ ♎ ☀ ★ ☂ ♞ ☯ ☭ ☢ € ☎ ⚑ ❄ ♫ ✂
IDN was originally proposed back in December 1996 by a guy called Martin Dürst and was implemented globally in 1998. Then, in 2009 ICANN approved the creation of Internationalized Country Code Top-Level Domains, or IDN ccTLDs, with the first few (.рф
,.中國
) being launched in 2010. To date there are around 67 of IDN ccTLDs.
What is xn--74h.xyz?
So, you probably typed in ☺.xyz
, or clicked a link, into your browser but your browser is probably displaying xn--74h.xyz
- This is called Punycode. The Domain Name System (DNS) doesn't support Unicode characters, only ASCII, so a system was created to ensure that IDN's could be supported, this is called Punycode.
Punycode works by taking the unicode domain name and encoding it into an ASCII readable format. So, for an example we'll use münich.com
. The "ü" isn't an ASCII character so when you hit enter on your browser it converts it into an ASCII format to do the DNS look-up: xn--mnich-kva.com
. All punycode domains start with XN--
.
So why is my browser displaying xn--74h.xyz?
Well, as with everything on the Internet...Spam, Phishing and Malware! With the roll out of IDN’s, there has been a large rise in Malware and Phishing attacks, called IDN homograph attacks, by taking advantage of Unicode. This is where a website such as: richardallen.co.uk
- which is displayed in Latin characters - is written in a way that a person may not notice by exploiting the face many characters look alike: riсhardallen.co.uk
. In this example I have swapped the Latin "c" in my name with the Cyrillic "c". This is also known as Script Spoofing - for this reason, most browsers display the IDN in punycode. On a side note, also because of this most registrars will prevent you from purchasing multi script domain names like riсhardallen.co.uk
.
This was built for demonstration purposes, and a bit of fun, by @pmrourkie. Feel free to tweet me if you think anything is misrepresented or if you want to add additional information.