Coding Theory and Error Correction in Everyday Use
Yesterday, I found an old scratched Pearl Jam CD that I had (the classic debut Ten album) and decided to put it in my CD player on my drive home to Alabama. Thankfully, the music showed no signs of corruption from years of small scratches. Of course when I first bought the CD, circa 1991, I thought its ability to play music was magic. Here’s a cool picture that I found of Pearl Jam — a definite band favorite of mine.

At any rate, this time around, it dawned upon me of how engineering works even in the harshest conditions. That is, whether we are playing music, or sending messages into deep space, coding theory and error correction is at the heart of designing the system.
For example, how can we transmit a message that is robust to transmision errors such as CD scratches? For now, we will assume our message consists of 1′s and 0′s. The first step is called encoding. In addition to just your message, encoding a message is done by appending what is known as parity bits in a unique fashion. This results in a code word. So why do this?
Well simply said, just because we would like to transmit a particular message over a channel, we might observe an entirely different message and code word at the receiver. For those of you not familar with systems design, the transmitter can be considered a CD in our case and the reciever is your speakers. The “channel” is the whole process of reading the data from the cd and interpreting it as music. Now you can see that errors may be caused from scratches, dirt, and ware of your CD. These parity bits that are discussed help and enable us to correct the underlying message and decode the correct data.
Lets dive right into an extremely popular coding scheme that is used, the 7-4 Hamming Code. Here, our message is of length 4 bits (d1d2d3d4) with 3 appending parity bits (p1p2p3), totally a code word of 7 bits (d1d2d3d4p1p2p3). Note, in reality, transmitted messages are much longer than our example since they are describing parts of data, i.e. track 5 of Pearl Jam “Black” on the Ten Album. Let’s say that our message 1101. How should we encode this? Looking at the following figure should help us.

The 7-4 Hamming scheme encodes a message by making sure that if we input our data message into the above diagram at the specified locations, then the parity bits are added to ensure the sum of each circle is even. Here I should note that we are using modulo 2 arithmetic for addition (e.g., 1+1 = 0, 1+0 = 1, …). Doing so, we can then see the above message of 1101 is encoded as 1101100. Hopefully this wasn’t bad! Now, what happens if we transmitted this code word, but due to extraneous noise caused from our system, the message received is shown below.

We see here that our observed code word is 1111100. The term “observed” here is important from a conceptual point of view. That is, we have no idea of what the input code word is nor any of the system noise (we can only assume a general model for our system at best). So how can we decode our message at the receiver with no knowledge of what the input is, but only that it was encoded via a 7-4 Hamming scheme?
One simple way is to put the code word in the above ven diagram, and see where the sum of each circle is not equal. Doing so, one sees that the bottom two circles are not even in their sum. But… should we flip the appropriate parity bits to be 1 rather than 0 or flip the third message bit from a 1 to a 0? Either is a valid code word from a decoding point of view. This means one can “correct” an error of the code word, but this code word may not be the underlying transmitted word. This is known as a “decoding error.” From this, the 7-4 Hamming code can be shown to correct at most 1 single bit error without resulting in a “decoding error.” Unfortunately, there is no way around this and is dependent on the coding scheme that is used. For the 7-4 Hamming code, if we have two or more bit errors in the transmission, the decoder may result in an error (but have a valid code). Because the decoder itself is beyond the scope of this post, I’ll refer you to this nice Wikipedia article. Also, given a coding scheme, different decoders may be used. For example, if you use a convolution code (which can be found on my project’s page soon), one can use what is known as a Trellis Map to efficiently decode your message. After we pass the information through the decoder, the correct code word will indeed be our transmitted code word. While I have simplified this example and shown the use of the above ven diagram, more complex schemes will need a more powerful way of decoding involving parity check matrices (see wiki article).
Lastly, I will note that this coding scheme can only correct a single bit flip as opposed to several bits in a row, which is called a “burst error.” So for our CD scratch example, where a scratch can be seen as several bits corrupted in a sequence, we will need to use other coding techniques in combination with the Hamming Code.
If you are interested, I have provided additional details on achieving a more efficient code via convolution codes. This will soon be found on my projects page with MATLAB code for you to test out. I have also written a short techncal summary that can be found on resume and publications page under “Technical Reports. Here’s a quick link.