H = 232: The f32 State Space

Experiment: h32 — 2026-02-11
"The mantissa is for training, not inference. The Boolean dynamics works better."

What This Experiment Shows

Each of the 128 neurons holds a 32-bit floating-point number, giving a theoretical state space of 24096. But how many of those bits actually matter? This experiment sets H = 232 and treats the IEEE 754 decomposition (sign/exponent/mantissa) as just one of many possible factorizations of the 32-bit space.

The answer: the sign bit is the computation, the exponent is routing, and the mantissa is noise. The effective state is 128 bits — one sign per neuron. Removing the mantissa makes the model better.

"The tanh activation and f32 mantissa exist for training (gradient flow through the saturation gate); at inference, the Boolean function encoded in the weight signs and magnitudes is the entire computation."
— h32.tex

The Numbers at a Glance

496
effective state bits at t=42 (not 4096)
300:52:1
per-bit leverage (sign:exp:mant)
31.6
sign changes per step (25% of neurons)
−0.139
bpc improvement from zeroing mantissa

Two Factorizations of 32 Bits

Every f32 value has 32 individually addressable bits. There are two ways to interpret them:

Hardware Factorization (IEEE 754)

Dynamical Factorization (Measured)

Flip each bit and measure KL divergence on the output:

Bit rangeChannelMean KL (bits)Dynamical roleLeverage
0–4mantissa (low)< 10-6dead
5–14mantissa (mid)< 10-4dormant
15–22mantissa (high)0.00044/bitactive memory
23–29exponent0.0080/bitimportance
31sign0.046topology

Per-Bit KL Leverage (log scale)

300:52:1 hierarchy: Sign is 300× more important per bit than mantissa, exponent is 52×. The hardware factorization aligns with the dynamical factorization at single-step leverage: sign is topology, exponent is importance, mantissa is noise.

State Snapshot: t = 42

112
saturated (E=127, |h|=1.0 exactly)
16
unsaturated (E=126, |h| ∈ [0.5,1))
17
unique f32 bit patterns
68/60
positive / negative signs

Effective State Decomposition

Saturated neurons carry exactly 1 bit each (the sign). Unsaturated neurons carry ~24 bits each (1 sign + 23 mantissa, exponent fixed at 126). Total: 112 × 1 + 16 × 24 = 496 bits — not 4,096 and not 128.

But for prediction, it's just 128 bits. The sign-only model achieves 99.7% of the compression gap, meaning the analog bits contribute only 0.007 bpc × 1023 = 7 bits total across all positions. The effective state for prediction is one sign bit per neuron.

Mantissa Ablation: Dynamics vs Readout

Five ways to run the RNN, varying where the mantissa is used:

ModebpcΔ from fullDescription
full f325.7210baseline
sign readout only5.728+0.007full dynamics, snap h → ±1 for Wy
sign dynamics5.690−0.031snap h → ±1 after each step
zero-mant readout5.637−0.084full dynamics, zero mantissa for Wy
zero-mant dynamics5.582−0.139zero mantissa after each step

Mantissa Ablation: Lower is Better

The mantissa is noise: Every variant that removes the mantissa from dynamics outperforms full f32. Zero-mantissa dynamics is 0.139 bpc better. Sign-only dynamics (a pure 128-bit Boolean automaton) is 0.031 bpc better. The mantissa actively degrades both prediction and dynamics.
A different trajectory, a better result: Sign-only dynamics has 52.2% sign agreement with full f32 after 100 steps — barely above chance. It's on a completely different trajectory through state space. Yet it compresses better. The weights encode a good Boolean function that is obscured by mantissa noise.

Mantissa Bit Sweep

Quantize the mantissa to 0–23 bits during dynamics:

bpc vs Mantissa Precision

Every level except 8 bits improves on full f32 (dashed line). Fewer bits → better bpc. The mantissa is not a graded resource — it is interference.

Mantissa bitsbpcΔ from full
0 (zero-mant)5.582−0.139
15.592−0.129
25.696−0.025
45.636−0.085
85.740+0.019
125.621−0.100
165.682−0.039
205.614−0.107
23 (full)5.7210

Bit Propagation Through Time

Flip one bit of unsaturated neuron h8 (|h| = 0.855) at t = 42 and measure downstream KL:

KL Divergence After Bit Flip

Sign bit amplifies, mantissa decays: Flipping the sign bit causes 0.125 bits KL at t=43, growing to 0.729 at t=46 — a 6× amplification. It's a global perturbation: changing the sign of Wh[j, ·] contributions to all 128 neurons. MSB mantissa: 0.0004 KL at t=43, then decay. LSB mantissa: zero at all future times.

State Statistics Across All Positions

MetricValue
Mean saturated neurons (|h| ≥ 0.999)123.0 / 128
Mean unsaturated5.0 / 128
Mean sign changes per step31.6
Mean bpc (full f32)5.721
Mean bpc (sign-only readout)5.728
Mean bpc (zero-mantissa readout)5.637
"The sign bits carry the long-range memory (which patterns have been observed). The mantissa bits carry the short-range precision (how recently, how strongly). The sign bits change rarely. The mantissa bits change continuously."
— h32.tex

The Revised Picture

The sat-rnn is a 128-bit Boolean automaton with a 2-value exponent channel:

128 signs
the computation
Long-range memory. Which patterns have been seen. 99.7% of compression.
~5 exponents
the routing
Which neurons are unsaturated ("open gates"). Changes every step.
23×128 mantissa
noise (needed for training)
Enables gradient flow during BPTT. Harmful at inference. Removing it improves bpc by 0.14.
"The weight matrices encode the Boolean transition function σt+1 = f(σt, xt) where σ is the sign vector and x is the input byte. The mantissa is the price paid for differentiable training of a Boolean function."
— h32.tex

Tool: experiments in h32.tex · Model: sat_model.bin from archive/20260209 · Data: first 1024 bytes of enwik9