Data Representation: Compression

Data Representation: Compression

Understanding Compression

  • Compression reduces the size of files to save storage space or to reduce the time taken to send files over a network.
  • It is broadly categorised into two types: lossless compression and lossy compression.

Lossless Compression

  • In lossless compression, the original data can be perfectly recovered when the file is uncompressed.
  • Examples include text file compression, ZIP file format, and PNG for images.
  • It is typically used in situations where absolute fidelity to the original data is necessary.

Lossy Compression

  • In lossy compression, some data from the original file is lost during compression.
  • The original data cannot be perfectly recovered in this case.
  • Examples include MP3 for audio and JPEG for images.
  • Despite some loss of data, users might not be able to discern any noticeable difference in quality. This is due to the compression algorithm eliminating data that is less important to human perception.
  • Lossy is generally used when a compromise between file size and exact fidelity to the original data is acceptable.

Understanding File Sizes

  • The size of a file is measured in bytes.
  • A kilobyte (KB) is approximately 1000 bytes, a megabyte (MB) is approximately 1000 kilobytes, a gigabyte (GB) is approximately 1000 megabytes, and so on.

Encoding and Compression

  • Encoding is the process of converting data from one form to another. It’s used in both types of compression.
  • In compression, the most commonly used encoding technique is Huffman coding. It’s a lossless compression method that assigns shorter codes to more frequently appearing characters in data.

Compression Ratio

  • The compression ratio measures how much a file is reduced in size by compression.
  • A higher compression ratio means a greater reduction in file size.

Importance of Compression

  • Compression is significant in minimising the storage space required for files and quickening the transfer speed of files over a network.
  • However, the choice between lossy and lossless depends on the specific requirements, whether it is more important to maintain quality or reduce size.