Side-by-side comparison of how AC and RNN carry context through time.
State: [low, high) ⊂ [0, 1)
Update: narrow interval by p(symbol)
Width shrinks: width *= p
Bits used: -log₂(width)
Precision limit: 32-64 bits
State: h ∈ ℝ¹²⁸ (within [-1,1] via tanh)
Update: h' = tanh(W·x + U·h + b)
Capacity shrinks: ||h|| changes
Bits used: -log₂ p(output)
Precision limit: float32 = 24 mantissa bits
| Arithmetic Coding | RNN | |
|---|---|---|
| State | [low, high) | ||h|| = ... |
| Width / Capacity | ... | ... |
| Bits accumulated | ... | ... |
| Entropy (uncertainty) | ... | |
| Surprisal (this symbol) | ... | |
Each prediction step is a Bayesian update. The surprisal shown above is the log-luck:
Λ = -log₂ p(symbol) = log₂ λ where λ = 1/p is the "luck"
The cumulative surprisal (643.7 bits) equals the compressed message length.
Unification (Q = λ): The quotient Q = |prior|/|posterior| equals the luck λ = 1/p.
All four views describe the same operation: narrowing possibilities by observing symbols.