Pretentious Programmer

A Primer To Unicode

Apr 2

Unicode is an encoding standard for characters maintained by the Unicode Consortium
Unlike ASCII which supported only 128 characters, Unicode aims to support all characters in all scripts of the world including emojis
As of 2025 Unicode defines more than 159000 characters in 172 scripts
Unicode characters are made up of one or more Unicode codepoints
Unicode codepoints range from U+0000 to U+10FFFF
The letter ‘a’ is made up of one unicode codepoint \U+0061

console.log('U+0061 = \u0061');

The Devanagari character लाँ is made up of the letter ल (U+0932) and the Marks ा (U+093E) and ँ (U+0901)

console.log('\u0932 + \u093E + \u0901 = \u0932\u093E\u0901');

The Devnagari character ‘कि’ is made up of two unicode codepoints ‘U+0915’ (क) and ‘U+093F’ (ि)

			
console.log('U+0915 = \u0915');
console.log('U+093F = \u093f');
console.log('U+0915 + U+093F = \u0915\u093f');

			
console.log('U+0915 = \u0915');
console.log('U+093F = \u093f');
console.log('U+0915 + U+093F = \u0915\u093f');

Another example the emoji 👩🏼‍🤝‍👨🏽, is made up of the 7 Unicode code points 👩 (U+1F469) + 🏼 (U+1F3FC) + ZWJ + 🤝 (U+1F91D) + ZWJ + 👨 (U+1F468) + 🏽 (U+1F3FD). Some of the codepoints are more than 16 bits long while JavaScript strings are encoded in UTF-16. You have to represent the longer codepoints into a UTF-16 surrogate pair like U+1F469 is converted to the surrogate pair 0xD83D and 0xDC69. Unicode encodes more than 3,790 emojis

			
console.log('\uD83D\uDC69\uD83C\uDFFC\u200D\uD83E\uDD1D\u200D\uD83D\uDC68\uD83C\uDFFD');

ASCII characters are made up of only one Unicode codepoint and range from U+0000 to U+007F (0 to 127)

			
for (let i = 0x0061; i < 0x007b; i++) {
  console.log(String.fromCodePoint(i));
}