← Back to Archive

Arithmetic Coding ↔ RNN Memory

Side-by-side comparison of how AC and RNN carry context through time.

Arithmetic Coding

State: [low, high) ⊂ [0, 1)
Update: narrow interval by p(symbol)
Width shrinks: width *= p
Bits used: -log₂(width)
Precision limit: 32-64 bits

RNN Hidden State

State: h ∈ ℝ¹²⁸ (within [-1,1] via tanh)
Update: h' = tanh(W·x + U·h + b)
Capacity shrinks: ||h|| changes
Bits used: -log₂ p(output)
Precision limit: float32 = 24 mantissa bits
100
Steps
7.0
Avg entropy
6.4
Avg surprisal
643.7
Total AC bits

Click a character to see state details

Step N: 'x'

Arithmetic CodingRNN
State [low, high) ||h|| = ...
Width / Capacity ... ...
Bits accumulated ... ...
Entropy (uncertainty) ...
Surprisal (this symbol) ...

AC Interval [low, high) over time

RNN Hidden State ||h|| over time

AC: -log₂(interval width) = cumulative bits

Entropy & Surprisal (shared prediction)

Key Insight: Both systems narrow down possibilities over time. AC does it explicitly (interval shrinks). RNN does it implicitly (hidden state encodes context). Both hit precision limits: AC after ~32-64 bits, RNN after ~24 bits × 128 dims ≈ 3000 bits (but not all usable due to correlation).

Bayesian Interpretation (Q = λ)

Each prediction step is a Bayesian update. The surprisal shown above is the log-luck:

Λ = -log₂ p(symbol) = log₂ λ    where λ = 1/p is the "luck"

The cumulative surprisal (643.7 bits) equals the compressed message length.

Unification (Q = λ): The quotient Q = |prior|/|posterior| equals the luck λ = 1/p.

  • Bayes: luck λ = 1/p, log-luck Λ = -log p
  • Thermo: microstates shrink by factor λ
  • AC interval: width shrinks by factor p (= 1/λ)
  • RNN: h encodes accumulated context, outputs p

All four views describe the same operation: narrowing possibilities by observing symbols.

← Back to Archive