Storage of Characters

Storage of Characters

Purpose of Character Storage

  • The storage of characters is a critical concept in computer science.
  • Converting characters to a format that can be processed by computers is fundamental to how data is stored and processed.

ASCII (American Standard Code for Information Interchange)

  • ASCII is a commonly used method for character storage.
  • It uses 7-bit binary codes to represent characters.
  • ASCII can represent 128 characters in total, including letters (both uppercase and lowercase), numbers, and symbols.
  • For example, the ASCII value for the uppercase letter “A” is 65, which is 1000001 in binary.

Extended ASCII

  • An extension of the ASCII system is the Extended ASCII which uses an 8-bit binary code.
  • This allows for the representation of 256 characters as opposed to the original 128.

Unicode

  • As the range of ASCII was found to be insufficient for representing all characters and symbols used around the globe, the Unicode standard was developed.
  • Unicode is capable of representing over a million unique characters, making it more appropriate for modern computing where globalisation is prominent.

UTF-8

  • UTF-8 is a system in Unicode used for transmitting data.
  • It uses an 8-bit system which aligns with the base unit of data in computers, the byte.
  • It has compatibility with ASCII as the first 128 characters are identical between them.

Importance of Character Storage

  • Understanding how characters are stored is essential to understanding how data is processed by computers.
  • It can also highlight potential limitations in systems depending on the chosen character storage technique.

In Summary

  • Characters are stored using various encoding standards, each with its own range and purpose: ASCII, Extended ASCII, Unicode and UTF-8.
  • ASCII and Extended ASCII have a limited number of characters, which can be a limitation for global applications.
  • Unicode has a much wider range of possible characters, with UTF-8 being a common approach due to its backward compatibility with ASCII and its adoption of the byte unit.