From: 3blue1brown

Error correction algorithms are mathematical techniques that allow data to be stored or transmitted in a way that is resilient to errors [00:00:32]. This means that even if the underlying bits of data are affected by noise or physical damage, the original information can often be recovered perfectly [00:00:21].

A common example of error correction in digital data storage is a scratched CD or DVD [00:00:03]. Although a scratch physically alters the 1s and 0s on the disk, the stored file can still play back correctly unless the damage is severe [00:00:10].

Basic Principles

Any file, whether video, sound, text, or an image, is fundamentally a sequence of 1s and 0s [00:00:42]. The core idea behind error correction schemes is that in the vast space of all possible messages, only a specific subset is considered valid [00:03:13]. If an altered message is received, the system aims to correct it back to the nearest valid version [00:03:28].

Simple Redundancy

A straightforward, but inefficient, strategy for error correction is to store three copies of each bit [00:00:50]. A machine reading the file could then compare these copies and apply a “best 2 out of 3” rule if there’s a discrepancy [00:00:57]. However, this method uses two-thirds of the storage space for redundancy and offers no strong guarantee against multiple bit flips [00:01:07].

Efficient Error Correction

The more interesting challenge is to correct errors while minimizing the amount of space given up for redundancy [00:01:17]. For instance, some methods allow storing data in 256-bit blocks where only 9 bits are used for redundancy, leaving 247 bits for meaningful data [00:01:24]. With this approach, if any single bit is flipped, a machine can identify the error’s precise location and correct it [00:01:40]. If two bits are flipped, the machine can detect that errors occurred, but it won’t be able to fix them [00:01:55].

Methods that enable such error correction are known as error correction codes [00:02:07].

Hamming Codes

Hamming codes represent one of the earliest examples of efficient error correction [00:02:22]. They were invented in the 1940s by Richard Hamming, who sought to overcome frequent program failures caused by misread bits on a punch card computer at Bell Labs [00:03:46]. While not as widely used as modern codes like the Reed-Solomon algorithm, Hamming codes offer a foundational understanding of the principles involved [00:02:59].

Parity Checks as Building Blocks

A crucial concept leading to Hamming codes is the parity check [00:05:47]. A parity check involves a single “parity bit” that a sender tunes to ensure the total number of 1s in a message (or a subset of bits) is an even number [00:06:04].

  • Detection: If any bit in the message flips (0 to 1 or 1 to 0), the total count of 1s changes from even to odd [00:06:37]. A receiver seeing an odd number of 1s can definitively know an error occurred, even without knowing its location [00:06:47].
  • Limitations: While effective for detecting any odd number of errors (1, 3, 5, etc.), a parity check cannot detect an even number of errors (like 2 or 4), as the parity would remain even [00:07:21]. Also, it doesn’t pinpoint the error location [00:06:55].

Locating Errors with Multiple Parity Checks

Hamming’s key insight was that applying multiple parity checks to carefully selected subsets of bits can not only detect an error but also pinpoint its exact location [00:08:32]. This functions like a game of “20 questions,” where yes/no queries halve the space of possibilities [00:08:46].

For a 16-bit block (positions 0-15), 4 positions are reserved as redundancy bits, typically powers of 2 (1, 2, 4, 8) [00:04:25]. These bits control the parity of specific subsets of data bits.

  • Example (16-bit block):
    • Check 1: Parity check on all odd-numbered positions (using position 1 as parity bit) [00:08:54].
    • Check 2: Parity check on the right half of the grid (using position 2 as parity bit) [00:09:50].
    • Check 3: Parity check on the odd rows (using position 4 as parity bit) [00:11:13].
    • Check 4: Parity check on the bottom two rows (using position 8 as parity bit) [00:11:26].

By combining the results of these checks, a single bit error can be precisely located within the block [00:11:37]. For instance, if errors are detected in the odd columns and the right half, the error must be in the last column [00:10:35].

The “No Error” Condition and Extended Hamming Codes

Originally, four parity checks yield 16 possible outcomes, which could pinpoint 1 of 16 positions. However, a 17th outcome is needed to indicate “no error” [00:14:17].

A simple solution is to discard position 0, making it a 15-bit block where 11 bits carry the message and 4 are for redundancy. If all four parity checks show even parities, it unambiguously means no error [00:14:33]. This is known as a 15-11 Hamming code [00:14:53].

An extended Hamming code utilizes position 0 as an additional parity bit for the entire block [00:15:08]. This allows for the detection of two-bit errors, even if they cannot be corrected [00:15:11]. If a single bit error occurs, the overall block parity becomes odd (detected by bit 0), and the four error-correcting checks will also indicate an error. If two errors occur, the overall block parity might revert to even, but the four error-correcting checks would still show anomalies, signaling that at least two errors happened [00:15:34].

Implementation Overview

When implementing a Hamming code:

  1. Sender: Divides the message into chunks (e.g., 11-bit chunks for a 16-bit block) [00:16:29].
  2. Sender: Places message bits into all positions not reserved for error correction (i.e., not powers of 2, plus position 0 for extended codes) [00:16:54].
  3. Sender: Calculates and sets the parity bits (at positions 1, 2, 4, 8) for their respective subsets to ensure even parity [00:17:05].
  4. Sender: For an extended Hamming code, calculates and sets the parity bit at position 0 to ensure the overall block has even parity [00:17:31].
  5. Receiver: Checks the parity of the same subsets of bits that the sender used to set the parity bits (at positions 1, 2, 4, 8) [00:18:21].
  6. Receiver: The combination of results from these checks (which ones are odd/even) precisely indicates the position of a single bit error, if any [00:18:38].
  7. Receiver: Checks the overall parity of the block (using bit 0). If it’s even but other parity checks indicate an error, it suggests multiple errors [00:18:51].
  8. Receiver: Corrects the identified bit by flipping it [00:19:01].
  9. Receiver: Extracts the original message bits, excluding the redundancy bits [00:19:05].

The elegance of this algorithm allows for the core logic to be condensed into a single operation, making it efficient for machines to implement [00:19:21]. The systematic scaling of Hamming codes means that for a block of 256 bits, only eight yes/no questions (corresponding to eight parity bits) are needed to binary search for and pinpoint an error location [00:13:02].