Difference between ASCII & Unicode Character Sets

ByAnkush Jain| Last Updated On Monday, Nov 12, 2018

Encoding Cryptography

Disclaimer: I am a consultant at Amazon Web Services, and this is my personal blog. The opinions expressed here are solely mine and do not reflect the views of Amazon Web Services (AWS). Any statements made should not be considered official endorsements or statements by AWS.

ASCII & Unicode both are character sets & both character sets (ASCII & Unicode) hold a list of characters with unique decimal numbers (code points). A= 65, B=66, C=67 etc.

ASCII

ASCII stands for American Standards Codes for Information Interchange.

ASCII character set contains 128 characters. Each number from 0 to 127 represents a character.

These 128 ASCII characters covers all Numeric numbers from (0-9), English alphabets upper case (A-Z), English alphabets lower case (a-z) & some other non-alphanumeric characters (~, ! , @, #, $, %, ^, &, *, (, ), _, ~, -, <, >, ?, /, . Etc.)

Each character, mentioned above has its own decimal value. For example, capital alphabets A-Z has a decimal value from 65 to 90, and small alphabets a-z has their decimal value from 97-122.

ASCII defines 128 characters, which map to the numbers 0–127. To represent a character of this range, ASCII requires only 7 bit.

Since, in Computer Science, the size of 1 byte equals 8 bits. It means we can represent 0 to 255 characters using one byte. Though all of our characters have been covered in 7 bits & we are left with one more extra bit. To utilize this extra bit, Extended ASCII Characters come into the picture.

The range of Extended ASCII Characters is 128 to 255. Click here to view the complete table of Extended ASCII characters.

Unicode

There are lots of characters in the world, which may include various symbols, and various language characters like Hindi, Urdu, Chinese, Arabic etc. Emoji characters that we currently use in social networking apps & a lot of other symbols which we might not even be aware of.

Unicode defines 2^21 characters, which, similarly, map to numbers 0 - 2^21. Though not all numbers are currently assigned. Some are free and some are reserved for future use.

It is said (As per Wikipedia), at present, Unicode defines 1,114,112 code positions. Almost 100,000 have been currently allocated & the rest are free or reserved for future use.

Though; its range is 2^21, it doesn't mean that we require only 21 bits to represent a Unicode character. To represent a Unicode character, the computer system uses Encoding. Hence the size of a Unicode character may differ from one Encoding scheme to another.

UTF-8 (1 Byte to 4 Byte)
UTF-16 (2 Byte or 4 Byte)
UTF-32 ( 4 Byte)

This link may help you regarding the size of Unicode characters.

References:

Difference between ASCII & Unicode Character Sets

ASCII

Unicode

Ankush JainSoftware Engineer

Recent Posts