Brotli’s compression techniques

Brotli is a compression algorithm that web browsers and other software use to transmit and save data in a smaller format, saving space, bandwidth, and processing power. As it reads through text, it can either pass it through uncompressed, save it as a reference to past text, or save it as a reference to text in its dictionary.

To see how these three techniques are combined into one algorithm, we can highlight our input text based on what technique Brotli uses for each part. The highlighting will update as you edit the input text.

Uncompressed

For new or rare words, Brotli has to simply pass the text along as-is. These are also called “literals.”

winter

Is compressed as:

w, i, n, t, e, r

Lookback Copy

Most text repeats itself, so if Brotli sees words or phrases multiple times, it just remembers the first instance and handles the rest as references back to the first.

Snow had fallen, snow on snow.

Is compressed as:

Snow had fallen, s(←14) on (←4).

Dictionary Reference

Brotli keeps a list of words and phrases that are common in website code. When it sees a word from its list/dictionary, it just remembers that word’s index instead of the actual word. You can page through the dictionary’s contents here.

In the bleak mid-winter,

Is compressed as:

[#4869] bleak mid-[#2154],

The other big Brotli technique is “Huffman coding,” which is ubiquitous in compression but harder to describe briefly. For more details, see RFC 7932.

This was inspired by reading through Mark Adler’s gorgeous Brotli decoder. Thanks also for Daniel Horn’s Brotli encoder and Tim Perry’s WebAssembly shims.