Product Pattern: Lifting Patterns to Joint Spaces

fig-20260131_2-02 | How patterns in factor spaces combine

Patterns in Factor Spaces

ES-level pattern p Vowel Other p: 0.73 "After vowel, predict Other" (consonants follow vowels) Within-ES pattern q e a q: 0.15 "Within vowels, e→a" (as in "ea" bigram) Independent patterns p operates at ES level q operates within ES No interaction needed

When event space E factors as E = E₁ × E₂, patterns can operate independently on each factor:

The Product Pattern p ⊗ q

Factor patterns p: Vowel → Other strength = 0.73 q: 'h' → 't' strength = 0.31 (in Other × Other) product Joint pattern p ⊗ q D V W P O D V W P O h→t p ⊗ q activates here: Vowel→Other region with 'h'→'t' within strength(p ⊗ q) = strength(p) × strength(q) 0.73 × 0.31 = 0.23 Product Pattern Formula Given: E = E₁ × E₂ (factored space) p: e₁ → e₁' in P₁ q: e₂ → e₂' in P₂ Product pattern: p ⊗ q: (e₁,e₂) → (e₁',e₂') s(p ⊗ q) = s(p) · s(q) (log-space: add strengths)

The product pattern p ⊗ q operates on joint events (e₁, e₂) → (e₁', e₂')

Why this matters for compression:

Instead of learning 65,536 separate byte→byte patterns, we learn:

  • ~25 ES→ES patterns (coarse structure)
  • ~200 within-ES patterns (fine detail)
  • Products give all combinations: 25 × 200 = 5,000 effective patterns

This is a 13× compression of the pattern space itself!

Concrete Example: "the" Prediction

Context: "th" → predict next t Other h Other ? Vowel Pattern decomposition: p₁: Other → Vowel "th" strongly predicts vowel (0.95) p₂: within Vowel, prefer 'e' "the" >> "tha", "thi", "tho", "thu" (0.82) Combined prediction: p₁ ⊗ p₂: "th" → "e" P("e" | "th") = 0.95 × 0.82 = 0.78 -log₂(0.78) = 0.36 bits vs naive: -log₂(1/256) = 8 bits

Product patterns decompose "th"→"e" prediction into ES-level and within-ES components

The product pattern framework shows how the RNN's predictions decompose:

P(next | context) = P(ES_next | context) × P(byte | ES_next, context)

Our 5 ESs capture the first factor. The remaining 41% of compression comes from the second factor (within-ES prediction conditioned on context).

Reproduction

Model: Elman RNN 256-128-256

Data: enwik9, "th" bigram analysis

Commands: ./hutter predict, ./hutter es

Theory: CMP paper §3.2 (factored event spaces)