Wednesday, April 17, 2019

Unicode

A thing we often take for granted nowadays is how easy our technology is to use. Things like the Graphical User Interface (GUI) and the invention of the computer mouse have made the computer "personal" and human-friendly. But what is often overlooked is the role standardization played in everything. It would be much harder to communicate if we didn't have a common way of typing and displaying letters. This is where the magic of Unicode comes in.

Unicode is a type of encoding that is the international/universal standard for how things are encoded. It's standardized set of numbers that are assigned to represent characters. IThe idea of Unicode was created  back in 1987 by employees of Xerox and Apple,  who wanted to create a universal character set that would also "encompass all of the world's living languages." In January of 1991, the Unicode Consortium, a non profit organization that manages Unicode, was formed in California. Later, the first set of characters for the Unicode standard were published.

In a computer, each character in a computer is stored as a binary number, and different computers would have different numbers that correspond to characters. For example, lets say you had 2 computers, computer A and computer B. Lets say in computer A,  "a" would = 1, "b" = 2, and "c" = 3  (these arent actual numbers). Maybe you want to send the word "cab" to computer B. So computer A would send the numbers 312 to computer B. But maybe in computer B, "a" = 2, "b" = 3, and "c" = 4. Computer B would display "b[random symbol]a". This obviously leads to a lot of problems.

 In the U.S, there was a standardized encoding called ASCII. ASCII, which stands for American Standard Code for Information Exchange, was originally developed in the 1960's for AT&T's teleprinters, but later made its way to be used in the first computers as well. It stores a character in 7 bits as a binary value. What this basically meant is that there was a maximum of 128 unique characters that could be stored. And with such little room, only latin characters were supported. Since the U.S was the center of the computing industry, there wasn't really a need to support other characters other than the ones used in the English language. Later on, people decided to add an 8th bit to it, extending the number of possible characters to 256. This made space for characters that have accents, like å ë ì õ ú. This 8 bit ASCII was referred to as "extended ASCII". Unicode uses the same numbers that ASCII uses for the original 128 characters, so that machines with both encodings could communicate with each other.

But what if you didn't speak a language that was written with latin characters? How would you be able to use computers without learning English? Prior to Unicode, there would we it was much more difficult and expensive to release software in multiple languages. Programs had to be made to support multiple encodings, and these encodings would have characters for only a few languages. In poorer countries, only the elite who were wealthy and fluent in English would be able to use a computer. Unicode made it easy to change the language in software, since it would be supported and show up as it should across every computer. Over time, thousands of languages were added to Unicode, and since almost every device used Unicode, every device would support these new languages. People could access a computer in their native tongue, which opened their door to the world wide web.

 Image result for unicode chinese
Unicode numbers for characters in various languages.As of 2012, there are 74,605 Chinese characters in unicode, including simplified & traditional. People had to manually assign numbers to each one! 

(Feel free to correct me if I have incorrect information)

No comments:

Post a Comment