Information and Compression
Fall 2018

ECE 110

Course Notes

Learn It!



Digital video is now the dominant type of internet traffic, whether sampled and quantized from an analog source (like a video chat) or generated directly in digital form (like a streamed video game). But raw video requires high data rates for streaming and large file sizes for storage. Data compression is the process of reducing the data rate (usually measured in bits per second) or file size (usually measured in bytes). Lossless compression techniques preserve the data exactly and achieve compression savings by representing predictable patterns in the data using small numbers of bits. Lossy compression sacrifices the exactness of the original data to achieve additional savings. We demonstrate these techniques using an image compression example.

Figure 1

Uncompressed image
Fig. 1: Uncompressed image. This uncompressed bitmap image contains $320\times240$ pixels. The color of each pixel is represented by 24 bits (8 bits for each RGB color channel). Given that 8 bits are 1 byte and that $2^{10}$ bytes are 1 kilobyte (kB), you can calculate that the file size is $225 \text{ kB}$. You can also verify this fact by downloading the image and checking its file properties. Image source.


Figure 2

Losslessly compressed image
Fig. 2: Losslessly compressed image. This image is the one in Fig. 1 compressed to a size of $61.1 \text{ kB}$ in the lossless PNG format. The compression is lossless because every pixel in this image is exactly the same color as the corresponding pixel in Fig. 1. The file size savings come from representing predictable patterns with small numbers of bits. For example, the bottom-left corner is a flat color and the top-right corner contains a gentle color gradient, both of which are common structures in digital images.


Figure 3

Lossy compressed image
Fig. 3: Lossy compressed image. This image is the one in Fig. 1 compressed to a size of $11.6 \text{ kB}$ in the lossy JPEG format. The compression is lossy because the individual pixel colors in Fig. 1 are not preserved exactly. You can see that the detail is blurrier than in Fig. 1 or Fig. 2. You can also see ringing artifacts surrounding the bright parts of the image.

The amount of compression achieved is typically described in two equivalent ways: data compression ratio (DCR) and percentage savings.
\begin{align}
\text{Data Compression Ratio (DCR)} &= \frac{\text{original file size}}{\text{compressed file size}} \text{ or } \frac{\text{original data rate}}{\text{compressed data rate}}\\
\text{Percentage Savings} &= \left( 1 - \frac{1}{\text{DCR}}\right) \times 100\%
\end{align}

Image Format File Size (kB) DCR Percentage Savings
Fig. 1 uncompressed bitmap 225 $\frac{225}{225}=1$ $\left( 1 - \frac{1}{1}\right) \times 100\% = 0\%$
Fig. 2 lossless PNG 61.1 $\frac{225}{61.1} \approx 3.68$ $\left( 1 - \frac{1}{3.68}\right) \times 100\% \approx 72.8\%$
Fig. 3 lossy JPEG 11.6 $\frac{225}{11.6} \approx 19.4$ $\left( 1 - \frac{1}{19.4}\right) \times 100\% \approx 94.8\%$
Table 1: DCR and percentage savings for the images in Fig. 1 to Fig. 3.


To preserve data exactly, we now know that lossless compression can reduce the number of bits used to represent the data. But there is a limit (called entropy) to how few bits are needed. Entropy is the amount of information actually present in the data. If you try to compress the data below the entropy, you are guaranteed to lose some of the information. Watch the short video below to understand entropy conceptually and see how its formula is obtained. (There is an error in this video at 1:26. The question "Is it B?" that appears on screen should be "Is it C?")



The entropy $H$ of data depends on the probabilities $p_1, p_2, \ldots, p_n$ of the values the data can take.
\begin{equation}
\begin{aligned}
H=\sum_{i=1}^n p_i \log_2 \left( \frac{1}{p_i} \right) \text{ bits}
\end{aligned} \label{INF-HFM}
\end{equation}
Notice that the entropy formula is the weighted average of the $\log_2(1/p_i)$ terms, each weighted by the probability $p_i$ of the respective value indexed by $i$.

Figure 4

Skittles
Fig. 4: Skittles. This collection of candy consists of 105 Skittles: 18 lemon-flavored, 14 lime-flavored, 28 orange-flavored, 26 strawberry-flavored and 19 grape-flavored. Image source.

Suppose you close your eyes and pick one of the Skittles in Fig. 4 at random. To calculate the entropy of the flavor of this Skittle, compute the probability $p_i$ of selecting the flavor $i$ as: the number of Skittles of flavor $i$ divided by the total number of Skittles.

Flavor $i$ Number of Skittles of Flavor $i$ Probability $p_i$
lemon 18 $\frac{18}{105} \approx 0.171$
lime 14 $\frac{14}{105} \approx 0.133$
orange 28 $\frac{28}{105} \approx 0.267$
strawberry 26 $\frac{26}{105} \approx 0.248$
grape 19 $\frac{19}{105} \approx 0.181$
Table 2: Calculation of $p_i$ for Skittle flavors.

Then use these five values of $p_i$ to evaluate the entropy $H$.
\begin{align}
H &= 0.171 \log_2 \left( \frac{1}{0.171} \right) + 0.133 \log_2 \left( \frac{1}{0.133} \right) + \cdots + 0.181 \log_2 \left( \frac{1}{0.181} \right)\\
&\approx 2.28 \text{ bits} \label{INF-HCL}
\end{align}
In other words, the amount of information represented by the flavor of the randomly chosen Skittle is 2.28 bits. If you repeated the Skittle selection many times (each time returning the chosen one back to the pile), you could not losslessly compress the sequence of selected flavors at less than 2.28 bits per selection.


A code is a mapping from symbols (such as Skittle flavors) to sequences of bits (called code words). A Huffman code is designed to perform lossless compression with an average code word length close to (but not less than) the entropy. Watch the short video below to learn how to construct a Huffman code by arranging the symbols in a tree structure based on how frequently they occur.



To begin constructing a Huffman tree, each symbol is associated with a node labeled with its frequency, which is either its probability or a count of the number of times it has occurred. Then the process alternates between two steps: SORT and LINK.


In the SORT step, the nodes are sorted in order of decreasing frequency.


In the LINK step, the two least frequent nodes are connected with branches to a parent node labeled with their combined frequency. In this way, two nodes are merged into one for the next SORT step. During the LINK step, we also label the branches with code bits 0 and 1. For consistency, we assign a code bit 1 to the branch leading to the more frequent node and a code bit 0 to the branch leading to the less frequent node.



The SORT and LINK steps repeat until all of the original nodes are connected to a single tree structure, as shown in the example below based on the Skittle flavor frequencies in Fig. 4.

Step Huffman Tree Building Explanation
SORT
The flavor nodes are sorted in order of decreasing number of Skittles.
LINK
The two nodes with least number of Skittles (lemon and lime) are connected to a parent node (lemon/lime) labeled with combined number of Skittles 32. From the parent node, we assign a code bit 1 to the branch leading to the higher number of Skittles (lemon) and a code bit 0 to the branch leading to the lower number of Skittles (lime).
SORT
The remaining nodes (including parent nodes) are sorted in order of decreasing number of Skittles.
LINK
The two nodes with least number of Skittles (strawberry and grape) are connected to a parent node (strawberry/grape) labeled with combined number of Skittles 45. From the parent node, we assign a code bit 1 to the branch leading to the higher number of Skittles (strawberry) and a code bit 0 to the branch leading to the lower number of Skittles (grape).
SORT
The remaining nodes (including parent nodes) are sorted in order of decreasing number of Skittles.
LINK
The two nodes with least number of Skittles (lemon/lime and orange) are connected to a parent node (lemon/lime/orange) labeled with combined number of Skittles 60. From the parent node, we assign a code bit 1 to the branch leading to the higher number of Skittles (lemon/lime) and a code bit 0 to the branch leading to the lower number of Skittles (orange).
SORT
The remaining nodes (including parent nodes) are sorted in order of decreasing number of Skittles.
LINK
The two remaining nodes (lemon/lime/orange and strawberry/grape) are connected to a parent node (all flavors) labeled with combined number of Skittles 105. From the parent node, we assign a code bit 1 to the branch leading to the higher number of Skittles (lemon/lime/orange) and a code bit 0 to the branch leading to the lower number of Skittles (strawberry/grape).
Table 4: Building a Huffman tree for Skittle flavors.

Each symbol is now mapped to a code word by reading off the code bits on the branches that lead from the root node (at the top of the tree) down to the symbol. The code words and their lengths $L_i$ for the Skittle flavor example are shown below.

Flavor $i$ Code Word Code Word Length $L_i$ (bits)
lemon 111 3
lime 110 3
orange 10 2
strawberry 01 2
grape 00 2
Table 5: Huffman code words and their lengths for Skittle flavors.

Notice that more frequent flavors (orange, strawberry and grape) have shorter code words than less frequent ones (lemon and lime). Therefore, compared to using code words of equal length, fewer bits are used to represent a sequence of flavors on average. In fact, the average code word length $L_{\text{avg}}$ is the weighted average of the code word lengths $L_i$, each weighted by the probability $p_i$ from Table 2.
\begin{align}
L_{\text{avg}} &= \sum_{i=1}^n p_i L_i \\
&= (0.171)(3) + (0.133)(3) + (0.267)(2) + (0.248)(2) + (0.181)(2)\\
&\approx 2.30 \text{ bits}
\end{align}
Observe that the average code word length $L_{\text{avg}} \approx 2.30 \text{ bits}$ is close to (but not less than) the entropy $H \approx 2.28 \text{ bits}$ from $\eqref{INF-HCL}$. This result is true in general for lossless compression:
\begin{equation}
\begin{aligned}
L_{\text{avg}} \geq H
\end{aligned} \label{INF-INQ}
\end{equation}


Decoding Huffman-coded data is straightforward. Consider an unknown flavor sequence that has been compressed using the Huffman code in Table 5. To break up the bit sequence into code words, read the bit sequence from left to right until you see a code word. Then continue in the same way until you see the next code word, and so on.

Figure 5

Huffman decoding
Fig. 5: Huffman decoding. To decode $110101000$ using Table 5, read the bits as follows: the first bit is $1$ (not a codeword); the first and second bits are $11$ (also not a code word); the first to third bits are $110$ (which is the code word for the flavor lime). Continuing this way breaks up the bit sequence into code words $110,10,10,00$, which maps to the flavor sequence: lime, orange, orange, grape.

Notice that there is no ambiguity in decoding. There is no other way to break up the bit sequence $110101000$ into code words because the first code word $110$ has prefixes $1$ and $11$, which are not themselves code words, and so on for the subsequent code words. In fact, no code word in Table 5 is a prefix of another code word. Such a code is called a prefix-free code.

All Huffman codes are prefix-free due to their tree structure. Recall that each code word is obtained by traversing the branches from the root node (at the top of the tree) down to the respective symbol. There are no symbols located at intermediate positions along these paths, so there are no code words that are prefixes of other code words. Therefore, Huffman decoding is guaranteed to be unambiguous.

Explore More!