ASCII and Unicode

ASCII and Unicode

Understanding ASCII

  • The American Standard Code for Information Interchange (ASCII) is a character encoding standard used to represent text in computers and other devices that use text.

  • ASCII uses 7 bits to represent each character, this makes it capable of representing 128 different characters including numbers, English letters, special characters and control codes.

  • Being devised in the USA, ASCII has a bias towards English language and does not support other alphabets, special symbols, and accents typically used in other languages.

Advanced Understanding of ASCII

  • The eighth bit in an ASCII character was originally used for a parity bit for error detection, but was later incorporated, resulting in the Extended ASCII.

  • Extended ASCII can represent 256 different characters doubling the original set.

  • Some characters in the extended set are control characters, while others represent lowercase and special characters, additional graphical symbols and foreign characters.

Understanding Unicode

  • Unicode is a character encoding standard with the goal of replacing all existing character encoding schemes, providing a unique number for every character irrespective of platform, program, or language.

  • Unlike ASCII, Unicode supports over a million characters and can accommodate characters and symbols from all languages around the world.

  • Unicode and ASCII are compatible, the first 128 Unicode characters are identical to ASCII.

Unicode Encoding Schemes

  • UTF-8, UTF-16, and UTF-32 are the three encoding schemes in Unicode that define how a character’s numerical value is represented.

  • UTF-8 is a variable-length encoding system that uses 8-bit bytes. It is backward compatible with ASCII and more byte efficient for ASCII characters.

  • UTF-16 is a variable-length encoding system, using either 16 or 32 bits, more efficient for languages with characters not represented in ASCII.

  • UTF-32 is a fixed-length, using 32 bits for each character, providing the ease of byte alignment but is less memory efficient.

Importance of ASCII and Unicode

  • ASCII and Unicode are the foundations of text processing in computer systems, including input, display and storage.

  • The invention of ASCII and its standardisation led to efficient and consistent data exchange and communication across different systems.

  • Unicode resolves the internationalization issue, allowing the representation and interchange of a vast array of world languages and symbols. This has greatly influenced global digital communication and the internet.