Characters

Characters

Character Sets

  • A character set is a list of characters that a computer recognises from their binary representations.
  • These character sets include standard printable characters (letters, numbers, symbols), as well as non-printing characters (like spaces and tabs).
  • Each character in a character set is assigned a unique number, often represented in binary, which identifies that character.

ASCII

  • The American Standard Code for Information Interchange (ASCII) is a widely used character set.
  • ASCII originally used a 7-bit binary code to represent each character. This allowed for 128 characters (2^7) in total.
  • There are two versions of ASCII: The basic ASCII set (including 95 printable characters and 33 control codes) and the extended ASCII set (which uses 8 bits per character and includes additional characters).

Unicode

  • Unicode is another character set that was created to include characters from all languages across the world, as ASCII could only represent Western characters.
  • Unicode uses a larger amount of bits to represent each character - up to 32 bits - allowing it to represent over a million different characters.
  • Unicode can represent a wider range of characters, including those used in non-Western languages, emojis, and other special characters.

Encodings

  • Encoding is the process of transforming a set of characters into a sequence of bytes.
  • Common encoding systems include UTF-8, UTF-16, and UTF-32. UTF stands for Unicode Transformation Format.
  • UTF-8 is widely used and can represent any character in the Unicode standard, yet it is backward-compatible with ASCII and supports multilingual text.

Importance of Character Sets and Encodings

  • Understanding how data is represented as characters is crucial in computing. It helps to handle text correctly, including proper display, storage, and transmission.
  • Different character sets and encoding schemes ensure that text data is interoperable across different platforms and supports global communication.
  • Any Byte of data can be represented as a character using the correct character set and encoding. By knowing which set and encoding was used, we can interpret the data as intended.