Storage of Characters

Storage of Characters

Character Set

  • A character set is a list of characters that a computer recognises from their binary representation.
  • Each character in a character set has a unique binary code, allowing it to be represented and stored digitally.
  • Character sets include typed letters, numbers, symbols, and control characters like line feed and carriage return.

ASCII (American Standard Code for Information Interchange)

  • ASCII is a character set that was widely used in the past. Each character is represented by 7 bits, which allows for 128 unique characters (2^7 = 128).
  • However, ASCII is limited as it doesn’t include many special characters found in non-English languages.

Unicode

  • Unicode is a modern and expansive character set designed to include almost every character from many writing systems.
  • Each Unicode character is typically represented by either 8, 16, or 32 bits, allowing for a much larger number of unique characters than ASCII.
  • Unicode’s encoding UTF-8 is backwards compatible with ASCII in that the first 128 characters of Unicode’s UTF-8 are the same as ASCII.

Using Characters in Programming

  • In computer programming, characters are stored as variables in the source code.
  • To manipulate and use these characters, programs use operations such as concatenation (linking characters together), extraction (getting certain characters from a string), and comparison (seeing if one character is equal, less than, or greater than another).