Characters

Character Sets

A character set is a list of characters that a computer recognises from their binary representations.
These character sets include standard printable characters (letters, numbers, symbols), as well as non-printing characters (like spaces and tabs).
Each character in a character set is assigned a unique number, often represented in binary, which identifies that character.

The American Standard Code for Information Interchange (ASCII) is a widely used character set.
ASCII originally used a 7-bit binary code to represent each character. This allowed for 128 characters (2^7) in total.
There are two versions of ASCII: The basic ASCII set (including 95 printable characters and 33 control codes) and the extended ASCII set (which uses 8 bits per character and includes additional characters).

Unicode is another character set that was created to include characters from all languages across the world, as ASCII could only represent Western characters.
Unicode uses a larger amount of bits to represent each character - up to 32 bits - allowing it to represent over a million different characters.
Unicode can represent a wider range of characters, including those used in non-Western languages, emojis, and other special characters.

Encoding is the process of transforming a set of characters into a sequence of bytes.
Common encoding systems include UTF-8, UTF-16, and UTF-32. UTF stands for Unicode Transformation Format.
UTF-8 is widely used and can represent any character in the Unicode standard, yet it is backward-compatible with ASCII and supports multilingual text.

Understanding how data is represented as characters is crucial in computing. It helps to handle text correctly, including proper display, storage, and transmission.
Different character sets and encoding schemes ensure that text data is interoperable across different platforms and supports global communication.
Any Byte of data can be represented as a character using the correct character set and encoding. By knowing which set and encoding was used, we can interpret the data as intended.