Storage of Characters

Storage of Characters

Character Set

A character set is a list of characters that a computer recognises from their binary representation.
Each character in a character set has a unique binary code, allowing it to be represented and stored digitally.
Character sets include typed letters, numbers, symbols, and control characters like line feed and carriage return.

ASCII (American Standard Code for Information Interchange)

ASCII is a character set that was widely used in the past. Each character is represented by 7 bits, which allows for 128 unique characters (2^7 = 128).
However, ASCII is limited as it doesn’t include many special characters found in non-English languages.

Unicode

Unicode is a modern and expansive character set designed to include almost every character from many writing systems.
Each Unicode character is typically represented by either 8, 16, or 32 bits, allowing for a much larger number of unique characters than ASCII.
Unicode’s encoding UTF-8 is backwards compatible with ASCII in that the first 128 characters of Unicode’s UTF-8 are the same as ASCII.

Using Characters in Programming

In computer programming, characters are stored as variables in the source code.
To manipulate and use these characters, programs use operations such as concatenation (linking characters together), extraction (getting certain characters from a string), and comparison (seeing if one character is equal, less than, or greater than another).