fig-20260131_2-01: Joint Event Space

fig-20260131_2-01 | The 256-byte space factors into ES membership × within-ES identity

Flat Space vs Factored Space

The 256-byte alphabet viewed as flat space (top-left) vs factored into ES × within-ES (bottom-left, right)

Key Insight: The RNN learns to predict at two levels:

Which ES? After "th", predict Vowels ES (not Digits, not Punct)
Which member? Within Vowels, predict 'e' over 'a', 'i', 'o', 'u'

This factorization explains 59% of the model's compression. The ES-level prediction accounts for 1.37 bits/char of the 2.31 bits/char total.

Input × Output: The Full Joint Space

The standard learning function ω₀ from the CMP paper records joint events (i, o) where i is the input byte and o is the output byte. This gives a 256×256 contingency table.

By factoring through ESs, we get a hierarchical joint space:

Coarse level: ES_prev × ES_next (25 joint events)
Fine level: within_prev × within_next (varies by ES pair)

Compression benefit: Instead of learning 65,536 probabilities, we learn:

25 ES-transition probabilities (coarse structure)
~50-100 significant within-ES patterns (fine detail)

Reproduction

Model: Elman RNN 256-128-256, trained on enwik9

ES definitions: ./hutter es (Digits, Vowels, Whitespace, Punct, Other)

Result: 5 ESs explain 59.3% of compression (1.37 of 2.31 bits/char)

Source: #es_eval, #method blocks in hutter.c

Joint Event Space: Byte Alphabet Factorization

Flat Space vs Factored Space

Input × Output: The Full Joint Space

Reproduction