The sat-rnn is a small recurrent neural network (128 hidden neurons, trained on 1024 bytes of English Wikipedia) that achieves 0.079 bits per character. It uses 32-bit floating-point (f32) numbers to represent each neuron's activation, giving it a state space of 24096 possible states per time step.
The central question: does the network actually use all that precision?
The answer is no. The network's computation is almost entirely determined by the sign bits of its 128 neurons — a 128-bit binary string. The floating-point mantissa (the fractional part that gives f32 its fine precision) contributes essentially nothing to inference. In fact, removing the mantissa makes the model better.
The "margin" measures how far each neuron's pre-activation is from the tanh threshold. A margin of 60.5 means the average neuron is so deeply saturated that its sign is determined with a safety factor of roughly 106 over the maximum possible mantissa perturbation. This is not approximate — the Boolean function is exact for 98.9% of all neuron-steps.
At each position in the input, we compare the full f32 model's prediction against a sign-only model (where every neuron value is replaced by ±1). If the sign carries all the information, these should produce similar predictions.
Most positions show near-zero BPC for both models. Occasional spikes (where the model is uncertain) may differ between f32 and sign-only, but sign-only is often better. The spikes correspond to hard-to-predict bytes.
The mean margin hovers around 1.2-1.5 (measured as mean across neurons at each position). Even positions with low margins have only a handful of "tiny" neurons (|z| < 0.1) out of 128.
To test whether the mantissa matters, we ran the model in five configurations. The results are striking: every variant that removes mantissa information outperforms the full f32 model.
| Configuration | BPC | vs Full f32 | Description |
|---|---|---|---|
| Zero-mantissa dynamics | 5.582 | -0.139 | Run dynamics with sign+exponent only, read out with sign+exponent |
| Zero-mantissa readout | 5.637 | -0.084 | Run full dynamics, but read out with sign+exponent only |
| Sign-only dynamics | 5.690 | -0.031 | Run dynamics with ±1 neurons, read out with ±1 |
| Full f32 | 5.721 | — | Standard 32-bit floating point (the trained model) |
| Sign-only readout | 5.728 | +0.007 | Run full dynamics, but replace h with sgn(h) at readout |
Not all bits in a 32-bit float are created equal. We measured how much each bit contributes to the model's predictions by computing the KL divergence when that bit is randomized.
The sign bit carries 300 times more information per bit than the exponent, and the exponent carries 52 times more than the mantissa. Mantissa bits 0-4 contribute less than 10-6 KL — effectively zero.
If the model is a Boolean automaton, we can ask: when one neuron flips its sign, which other neurons change? This gives us the influence graph — the wiring diagram of the Boolean function.
| Edge | Influence | Description |
|---|---|---|
| h112 → h73 | 0.423 | Strongest non-self edge |
| h16 → h102 | 0.420 | |
| h8 → h8 | 0.408 | Self-loop (h8 maintains its own state) |
| h52 → h27 | 0.370 | |
| h112 → h59 | 0.358 | h112 is a hub (high out-degree) |
| h50 → h76 | 0.358 | |
| h8 → h52 | 0.352 | h8 drives h52 |
| h76 → h59 | 0.349 | |
| h61 → h26 | 0.343 | |
| h8 → h90 | 0.334 | h8 drives h90 |
A Boolean automaton might settle into fixed points or cycles. This one does not. We tested by running 50 random initial states for 100 steps with a fixed input character.
This is consistent with the sign-vector uniqueness: 1023 positions in the data produce 1023 unique sign vectors out of 1023. The state entropy is maximal at 10.0 bits (log2(1023)).
We can trace backward through the Boolean dynamics to see how information flows. At position t=42 (input character '/'), the top neurons and their source chains are:
| Neuron | Δbpc | Sign | z (pre-act) | Top W_h Source | Chain (depth 3) |
|---|---|---|---|---|---|
| h56 | +0.0056 | -1 | -5.06 | h50 (+1.08) | h56 ← h50 ← h17 |
| h8 | +0.0020 | +1 | +0.64 | h8 (-1.25) | h8 ← h8 ← h8 (self-loop) |
| h68 | +0.0011 | +1 | +2.58 | h90 (-0.76) | h68 ← h90 ← h8 |
| h52 | +0.0008 | -1 | -1.06 | h8 (+1.28) | h52 ← h8 ← h8 |
| h99 | +0.0005 | +1 | +2.38 | h68 (-0.99) | h99 ← h68 ← h90 |
h8 appears repeatedly — it is a hub neuron with a strong self-connection (W_h[8,8] = -1.25), functioning as an oscillator. Multiple chains route through h8.
This result has implications for how we understand neural networks in general:
Papers: boolean-automaton.pdf (6 pages) • h32.pdf (H = 232) • q1-exact-results.pdf (f32 vs exact)
Programs: q1_boolean.c • q1_margins.c • q1_bool_attractor.c • q1_bit_sample.c
Related experiments: Neuron Knockout • Saturation Dynamics • Offset Analysis • Per-Prediction Justifications