Storage of Characters
Storage of Characters
Character Set
- A character set is a list of characters that a computer recognises from their binary representation.
- Each character in a character set has a unique binary code, allowing it to be represented and stored digitally.
- Character sets include typed letters, numbers, symbols, and control characters like line feed and carriage return.
ASCII (American Standard Code for Information Interchange)
- ASCII is a character set that was widely used in the past. Each character is represented by 7 bits, which allows for 128 unique characters (2^7 = 128).
- However, ASCII is limited as it doesn’t include many special characters found in non-English languages.
Unicode
- Unicode is a modern and expansive character set designed to include almost every character from many writing systems.
- Each Unicode character is typically represented by either 8, 16, or 32 bits, allowing for a much larger number of unique characters than ASCII.
- Unicode’s encoding UTF-8 is backwards compatible with ASCII in that the first 128 characters of Unicode’s UTF-8 are the same as ASCII.
Using Characters in Programming
- In computer programming, characters are stored as variables in the source code.
- To manipulate and use these characters, programs use operations such as concatenation (linking characters together), extraction (getting certain characters from a string), and comparison (seeing if one character is equal, less than, or greater than another).