Thanks in advance
Did you try googling “what is unicode”? If you’ve already read the documentation, wikipedia, etc, then what part(s) don’t you understand and want help with?
This is a good starting point
A string i.e. a sequence of characters is represented as numbers in computer programs - each distinct character needs a unique numeric assignment - the English alphabet has 26 letters - so we could say
A is 0,
B is 1, …,
Z is 25 - this works for
A-Z but what about lowercase letters - surely there are times when
a is treated differently from
A - so let’s keep the
A-Z assignments and say
a is 26,
b is 27, …,
51 - this takes care of upper and lowercase
A-Z - but what about numbers themselves - ok let’s say
0-9 are assigned 52-61 - what about punctuation - comma, period, question mark, exclamation point, … - what about space, tab?
You see where I’m going with this - ASCII is an old and standard assignment of numeric codes 0-127 to characters commonly used in English - but ASCII covers just 128 characters
Now the problem is what about characters in other languages - well if we don’t need to mix and match languages in the same program we could use similar possibly overlapping assignments for other languages - e.g. Arabic letters, digits and punctuation are quite different from English but a program that just needs to process Arabic strings could use the same numbers 0-127 to represent just Arabic characters and it would work
These separate representations of world languages have an inherent limitation of being unusable for multiple languages at the same time - people came up with systems of switching between representations so a program could use 0-127 for English at one point and switch to 0-127 for Arabic at another - it gets complicated fast but did not seem to get old fast enough because such systems are still in use - in reality 0-127 got reserved for English and other languages began to use 128-255 because 0-255 can be stored in just one byte of 8 bits
Unicode is an attempt to standardize a common representation of all languages of the world - clearly 128 or 256 numbers are not enough - the numbers run over a million - these numbers are called code points - the code point for
A is 65,
a is 97,
0 is 48, the Arabic alif ا is 1575, Arabic zero ٠ is 1632
Unicode lets programs handle strings in all languages of the world
Once upon a time, there were a million and one different encoding systems for all the languages of the world. Because we didn’t know any better back then, we would often send information without including the proper metadata about which encoding it was using. Sometimes, software developers would neglect to check which encoding was intended before displaying the information.
Then, along came unicode and solved all our problems. The end.
But on the whole… Unicode is an excellent solution to a difficult problem: how to display the many and varied languages of the world without �������ing your ���.