We ran the sat-rnn (128 hidden, 0.079 bpc) forward and backward in both f32 and MPFR-256 exact arithmetic. The forward pass is well-behaved. The backward pass decorrelates in a single step. But pattern rankings are perfect.
Hidden state h42 after 42 exact timesteps vs f32. The forward pass is remarkably stable.
| Metric | Value | |
|---|---|---|
| Sign agreement | 128 / 128 | |
| Exponent agreement | 124 / 128 | |
| Mean mantissa bits | 21.5 / 23 | |
| Max absolute error | 4.74 × 10−1 | |
| Mean absolute error | 8.30 × 10−3 | |
| Near-zero neurons | 0 |
The gradient g42 = ∂ log P(y) / ∂h is where f32 error first becomes significant.
| Metric | Value | |
|---|---|---|
| Sign agreement | 128 / 128 | |
| Exponent agreement | 111 / 128 | |
| Mean mantissa bits | 4.8 / 23 | |
| ||gf32|| | 1.878 | |
| ||gexact|| | 1.886 | |
| Relative error | 3.7% |
The backward Jacobian trace from d = 0 to d = 42. Watch the mantissa die at d=1, sign die by d=10, and the error ratio lock onto 3.44.
| d | Sign | Exp | Mant bits | ||gf32|| | ||gex|| | ||Δ||/||gex|| | Phase |
|---|---|---|---|---|---|---|---|
| 0 | 128 | 111 | 4.8 | 1.88 | 1.89 | 0.037 | Agreement |
| 1 | 90 | 23 | 0.7 | 4.73 | 7.02 | 0.80 | Transition |
| 2 | 105 | 30 | 1.1 | 8.82 | 15.5 | 0.58 | Transition |
| 3 | 128 | 1 | 0.8 | 39.0 | 90.1 | 0.57 | Transition |
| 5 | 59 | 30 | 0.6 | 51.6 | 37.1 | 1.66 | Transition |
| 7 | 19 | 21 | 0.2 | 116 | 54.1 | 3.07 | Decorrelation |
| 10 | 1 | 3 | 0.0 | 596 | 248 | 3.41 | Decorrelation |
| 15 | 0 | 0 | 0.0 | 8.7×105 | 3.6×105 | 3.43 | Decorrelation |
| 20 | 0 | 0 | 0.0 | 3.6×106 | 1.5×106 | 3.44 | Decorrelation |
| 30 | 0 | 0 | 0.0 | 1.9×103 | 7.9×102 | 3.44 | Decorrelation |
| 42 | 0 | 0 | 0.0 | 2.0×105 | 8.5×104 | 3.41 | Decorrelation |
Flipping each bit in each neuron at t=42 and measuring the KL divergence of the output distribution. The hierarchy is stark: sign : exponent : mantissa = 300 : 52 : 1 per bit.
| Bits | Channel | Mean KL (bits) | Per bit | Flips > 0.1 bpc | Leverage |
|---|---|---|---|---|---|
| 0–4 | mantissa (low) | < 10−6 | < 10−7 | 0 | |
| 5–14 | mantissa (mid) | < 10−4 | < 10−5 | 0 | |
| 15–22 | mantissa (high) | 0.0035 | 0.00044 | 61 | |
| 23–30 | exponent | 0.064 | 0.0080 | ~90 | |
| 31 | sign | 0.046 | 0.046 | 110 |
The 3.44 ratio at t=42 is a transient. Running the same analysis across all positions reveals three regimes.
| t | h mant | g0 mant | ratio d=0 | ratio d=10 | cos(gf32, gex) | ||gf32||/||gex|| | Phase |
|---|---|---|---|---|---|---|---|
| 10 | 22.9 | 22.5 | 0.00 | 0.01 | +1.000 | 0.99 | Agreement |
| 20 | 23.0 | 22.2 | 0.00 | 0.00 | +1.000 | 1.00 | Agreement |
| 35 | 22.0 | 10.2 | 0.00 | 0.04 | +1.000 | 0.96 | Agreement |
| 45 | 21.7 | 5.7 | 0.02 | 1.08 | −0.694 | 0.11 | Transition |
| 60 | 18.0 | 1.9 | 0.32 | 1.00 | −0.082 | 0.03 | Decorrelation |
| 100 | 6.1 | 1.6 | 0.42 | 1.00 | +0.050 | 0.00 | Decorrelation |
| 200 | 10.4 | 1.1 | 0.56 | 1.00 | −0.229 | 0.00 | Decorrelation |
| 500 | 14.0 | 2.1 | 0.29 | 1.00 | −0.131 | 0.00 | Decorrelation |
| 1000 | 14.5 | 2.2 | 0.21 | 0.99 | +0.197 | 0.04 | Decorrelation |
The punchline: even though gradient magnitudes diverge by 2.44×, the ranking of which patterns matter is perfectly preserved.
| d | ρ | Agree | f32-only | exact-only | Mean ratio | |
|---|---|---|---|---|---|---|
| 1 | +0.52 | 126 | 0 | 2 | 0.69 | |
| 2 | +0.74 | 128 | 0 | 0 | 0.61 | |
| 5 | +0.50 | 128 | 0 | 0 | 1.30 | |
| 8 | +0.91 | 128 | 0 | 0 | 2.53 | |
| 10 | +0.99 | 128 | 0 | 0 | 2.40 | |
| 11 | +1.00 | 128 | 0 | 0 | 2.44 | |
| 15 | +1.00 | 128 | 0 | 0 | 2.43 | |
| 20 | +1.00 | 128 | 0 | 0 | 2.44 |
Full forward pass in f32 and MPFR-256 over all 1023 positions. f32 costs exactly 0.071 bpc.
| Phase | bpc (f32) | bpc (exact) | Diff | n |
|---|---|---|---|---|
| t < 50 | 6.612 | 6.642 | −0.031 | 50 |
| 50–200 | 6.805 | 6.652 | +0.153 | 150 |
| 200–500 | 5.469 | 5.351 | +0.118 | 300 |
| 500+ | 5.469 | 5.439 | +0.031 | 523 |
| Overall | 5.721 | 5.650 | +0.071 | 1023 |
How did the pre-experiment predictions hold up?
| Prediction | Expected | Actual | |
|---|---|---|---|
| P1: Sign agreement | 128/128 | 128/128 ✓ | |
| P1: Exponent agreement | 128/128 | 124/128 (close) | |
| P1: Mantissa bits | ~18/23 | 21.5/23 (better) | |
| P2: Gradient sign | 128/128 | 128/128 ✓ | |
| P2: Gradient mantissa | 15–18 bits | 4.8 bits (much worse) | |
| P3: Gates killed | 60–80 | 114 (more) | |
| P4: Sign dies at d≈ | 10–15 | 7 (faster) | |
| P4: Mantissa decay | logarithmic | instant (0.7 at d=1) | |
| P4: Error ratio | growing | 3.44 (transient), 1.0 (steady) |
Paper: q1-exact-results.pdf (source)
Programs: q1-exact.pdf (protocol),
p1, p2, p3,
p5, p6,
lyapunov, lyapunov2,
bit_sample
Related: Protocol B, Protocol C,
Q1 Sparsity, Implementation Notes
Model: sat_model.bin (128 hidden, 0.079 bpc). Data: first 1024 bytes of enwik9.