Product Pattern: Lifting Patterns to Joint Spaces
fig-20260131_2-02 | How patterns in factor spaces combine
Patterns in Factor Spaces
When event space E factors as E = E₁ × E₂, patterns can operate independently on each factor:
- p ∈ P₁: Pattern in ES space (which character class follows which)
- q ∈ P₂: Pattern within ES (which specific byte within the class)
The Product Pattern p ⊗ q
The product pattern p ⊗ q operates on joint events (e₁, e₂) → (e₁', e₂')
Why this matters for compression:
Instead of learning 65,536 separate byte→byte patterns, we learn:
- ~25 ES→ES patterns (coarse structure)
- ~200 within-ES patterns (fine detail)
- Products give all combinations: 25 × 200 = 5,000 effective patterns
This is a 13× compression of the pattern space itself!
Concrete Example: "the" Prediction
Product patterns decompose "th"→"e" prediction into ES-level and within-ES components
The product pattern framework shows how the RNN's predictions decompose:
P(next | context) = P(ES_next | context) × P(byte | ES_next, context)
Our 5 ESs capture the first factor. The remaining 41% of compression comes from the second factor (within-ES prediction conditioned on context).
Reproduction
Model: Elman RNN 256-128-256
Data: enwik9, "th" bigram analysis
Commands: ./hutter predict, ./hutter es
Theory: CMP paper §3.2 (factored event spaces)