# 2.1 LZ77+Huffman Compression Algorithm Details

The overall compression algorithm for the Huffman [IEEE-MRC] variant can handle an arbitrary amount of data. Data is processed in 64k blocks, and the encoded results are stored in-order. After the final block, the end-of-file (EOF) symbol is encoded. Each 64k block is run through three stages, which are performed in this order:

Perform

**LZ77**([UASDC]) compression to generate an intermediate compressed buffer.Construct canonical

**Huffman codes**.Process the intermediate

**LZ77**data, and re-encode it in a Huffman-based bit stream.

The algorithm cannot start Huffman encoding until it has
computed the **Huffman codes**, and it cannot compute the **Huffman codes**
until it knows the frequency of each symbol in the **Huffman alphabet**. To
compute these frequencies, the algorithm first performs the **LZ77** phase.
For efficiency, the algorithm SHOULD store the **LZ77** output so that the
final phase does not have to recompute it.

The final compression format consists of two parts:

The first 256 bytes indicate the bit length of each of the 512

**Huffman symbol**s (see prefix code).The remainder of the data is a sequence of

**Huffman symbol**s, along with match lengths and distances.

The **Huffman alphabet** consists of 512 symbols, each
with a numeric value in the range 0-511. The symbols 0-255 represent literal
values that correspond to raw byte values as opposed to matches. The symbols
256-511 represent **matches** or **references** indicating that the next
several bytes are the same as some bytes that previously occurred in the data.
Each match consists of two encoded integers: a length and a distance. When the
decoding method encounters a match symbol, the original data is reconstructed
by copying <length> bytes from the position in its previously
decompressed data of <[decompression cursor] – [match distance]>.