Learn It!

Required

Explore More

Learn It!

Digital video is now the dominant type of internet traffic, whether sampled and quantized from an analog source (like a video chat) or generated directly in digital form (like a streamed video game). But raw video requires high data rates for streaming and large file sizes for storage. Data compression is the process of reducing the data rate (usually measured in bits per second) or file size (usually measured in bytes). Lossless compression techniques preserve the data exactly and achieve compression savings by representing predictable patterns in the data using small numbers of bits. Lossy compression sacrifices the exactness of the original data to achieve additional savings. We demonstrate these techniques using an image compression example.

Figure 1

Figure 2

Figure 3

The amount of compression achieved is typically described in two equivalent ways: data compression ratio (DCR) and percentage savings.

\begin{align}

\text{Data Compression Ratio (DCR)} &= \frac{\text{original file size}}{\text{compressed file size}} \text{ or } \frac{\text{original data rate}}{\text{compressed data rate}}\\

\text{Percentage Savings} &= \left( 1 - \frac{1}{\text{DCR}}\right) \times 100\%

\end{align}

Image | Format | File Size (kB) | DCR | Percentage Savings |
---|---|---|---|---|

Fig. 1 | uncompressed bitmap | 225 | $\frac{225}{225}=1$ | $\left( 1 - \frac{1}{1}\right) \times 100\% = 0\%$ |

Fig. 2 | lossless PNG | 61.1 | $\frac{225}{61.1} \approx 3.68$ | $\left( 1 - \frac{1}{3.68}\right) \times 100\% \approx 72.8\%$ |

Fig. 3 | lossy JPEG | 11.6 | $\frac{225}{11.6} \approx 19.4$ | $\left( 1 - \frac{1}{19.4}\right) \times 100\% \approx 94.8\%$ |

To preserve data exactly, we now know that lossless compression can reduce the number of bits used to represent the data. But there is a limit (called entropy) to how few bits are needed. Entropy is the amount of information actually present in the data. If you try to compress the data below the entropy, you are guaranteed to lose some of the information. Watch the short video below to understand entropy conceptually and see how its formula is obtained. (There is an error in this video at 1:26. The question "Is it B?" that appears on screen should be "Is it C?")

The entropy $H$ of data depends on the probabilities $p_1, p_2, \ldots, p_n$ of the values the data can take.

\begin{equation}

\begin{aligned}

H=\sum_{i=1}^n p_i \log_2 \left( \frac{1}{p_i} \right) \text{ bits}

\end{aligned} \label{INF-HFM}

\end{equation}

Notice that the entropy formula is the weighted average of the $\log_2(1/p_i)$ terms, each weighted by the probability $p_i$ of the respective value indexed by $i$.

Figure 4

Suppose you close your eyes and pick one of the Skittles in Fig. 4 at random. To calculate the entropy of the flavor of this Skittle, compute the probability $p_i$ of selecting the flavor $i$ as: the number of Skittles of flavor $i$ divided by the total number of Skittles.

Flavor $i$ | Number of Skittles of Flavor $i$ | Probability $p_i$ |
---|---|---|

lemon | 18 | $\frac{18}{105} \approx 0.171$ |

lime | 14 | $\frac{14}{105} \approx 0.133$ |

orange | 28 | $\frac{28}{105} \approx 0.267$ |

strawberry | 26 | $\frac{26}{105} \approx 0.248$ |

grape | 19 | $\frac{19}{105} \approx 0.181$ |

Then use these five values of $p_i$ to evaluate the entropy $H$.

\begin{align}

H &= 0.171 \log_2 \left( \frac{1}{0.171} \right) + 0.133 \log_2 \left( \frac{1}{0.133} \right) + \cdots + 0.181 \log_2 \left( \frac{1}{0.181} \right)\\

&\approx 2.28 \text{ bits} \label{INF-HCL}

\end{align}

In other words, the amount of information represented by the flavor of the randomly chosen Skittle is 2.28 bits. If you repeated the Skittle selection many times (each time returning the chosen one back to the pile), you could not losslessly compress the sequence of selected flavors at less than 2.28 bits per selection.

A code is a mapping from symbols (such as Skittle flavors) to sequences of bits (called code words). A Huffman code is designed to perform lossless compression with an average code word length close to (but not less than) the entropy. Watch the short video below to learn how to construct a Huffman code by arranging the symbols in a tree structure based on how frequently they occur.

To begin constructing a Huffman tree, each symbol is associated with a node labeled with its frequency, which is either its probability or a count of the number of times it has occurred. Then the process alternates between two steps: SORT and LINK.

In the SORT step, the nodes are sorted in order of decreasing frequency.

In the LINK step, the two least frequent nodes are connected with branches to a parent node labeled with their combined frequency. In this way, two nodes are merged into one for the next SORT step. During the LINK step, we also label the branches with code bits 0 and 1. For consistency, we assign a code bit 1 to the branch leading to the more frequent node and a code bit 0 to the branch leading to the less frequent node.

The SORT and LINK steps repeat until all of the original nodes are connected to a single tree structure, as shown in the example below based on the Skittle flavor frequencies in Fig. 4.

Step | Huffman Tree Building | Explanation |
---|---|---|

SORT | | The flavor nodes are sorted in order of decreasing number of Skittles. |

LINK | | The two nodes with least number of Skittles (lemon and lime) are connected to a parent node (lemon/lime) labeled with combined number of Skittles 32. From the parent node, we assign a code bit 1 to the branch leading to the higher number of Skittles (lemon) and a code bit 0 to the branch leading to the lower number of Skittles (lime). |

SORT | | The remaining nodes (including parent nodes) are sorted in order of decreasing number of Skittles. |

LINK | | The two nodes with least number of Skittles (strawberry and grape) are connected to a parent node (strawberry/grape) labeled with combined number of Skittles 45. From the parent node, we assign a code bit 1 to the branch leading to the higher number of Skittles (strawberry) and a code bit 0 to the branch leading to the lower number of Skittles (grape). |

SORT | | The remaining nodes (including parent nodes) are sorted in order of decreasing number of Skittles. |

LINK | | The two nodes with least number of Skittles (lemon/lime and orange) are connected to a parent node (lemon/lime/orange) labeled with combined number of Skittles 60. From the parent node, we assign a code bit 1 to the branch leading to the higher number of Skittles (lemon/lime) and a code bit 0 to the branch leading to the lower number of Skittles (orange). |

SORT | | The remaining nodes (including parent nodes) are sorted in order of decreasing number of Skittles. |

LINK | | The two remaining nodes (lemon/lime/orange and strawberry/grape) are connected to a parent node (all flavors) labeled with combined number of Skittles 105. From the parent node, we assign a code bit 1 to the branch leading to the higher number of Skittles (lemon/lime/orange) and a code bit 0 to the branch leading to the lower number of Skittles (strawberry/grape). |

Each symbol is now mapped to a code word by reading off the code bits on the branches that lead from the root node (at the top of the tree) down to the symbol. The code words and their lengths $L_i$ for the Skittle flavor example are shown below.

Flavor $i$ | Code Word | Code Word Length $L_i$ (bits) |
---|---|---|

lemon | 111 | 3 |

lime | 110 | 3 |

orange | 10 | 2 |

strawberry | 01 | 2 |

grape | 00 | 2 |

Notice that more frequent flavors (orange, strawberry and grape) have shorter code words than less frequent ones (lemon and lime). Therefore, compared to using code words of equal length, fewer bits are used to represent a sequence of flavors on average. In fact, the average code word length $L_{\text{avg}}$ is the weighted average of the code word lengths $L_i$, each weighted by the probability $p_i$ from Table 2.

\begin{align}

L_{\text{avg}} &= \sum_{i=1}^n p_i L_i \\

&= (0.171)(3) + (0.133)(3) + (0.267)(2) + (0.248)(2) + (0.181)(2)\\

&\approx 2.30 \text{ bits}

\end{align}

Observe that the average code word length $L_{\text{avg}} \approx 2.30 \text{ bits}$ is close to (but not less than) the entropy $H \approx 2.28 \text{ bits}$ from $\eqref{INF-HCL}$. This result is true in general for lossless compression:

\begin{equation}

\begin{aligned}

L_{\text{avg}} \geq H

\end{aligned} \label{INF-INQ}

\end{equation}

Decoding Huffman-coded data is straightforward. Consider an unknown flavor sequence that has been compressed using the Huffman code in Table 5. To break up the bit sequence into code words, read the bit sequence from left to right until you see a code word. Then continue in the same way until you see the next code word, and so on.

Figure 5

Notice that there is no ambiguity in decoding. There is no other way to break up the bit sequence $110101000$ into code words because the first code word $110$ has prefixes $1$ and $11$, which are not themselves code words, and so on for the subsequent code words. In fact, no code word in Table 5 is a prefix of another code word. Such a code is called a prefix-free code.

All Huffman codes are prefix-free due to their tree structure. Recall that each code word is obtained by traversing the branches from the root node (at the top of the tree) down to the respective symbol. There are no symbols located at intermediate positions along these paths, so there are no code words that are prefixes of other code words. Therefore, Huffman decoding is guaranteed to be unambiguous.

Explore More!