The sat-rnn has 128 hidden neurons and 82,304 trainable parameters. But how many of those neurons actually contribute to the model's predictions? To find out, we systematically knock out neurons (zeroing their W_y readout column) and measure the effect on BPC.
The result is dramatic: the vast majority of neurons contribute nothing — or worse, they add noise. Keeping only the top 15-30 neurons produces a better model than the full 128.
Each bar shows how much BPC increases when a single neuron is removed. The baseline model achieves 0.0792 bpc.
h8 is by far the most important neuron. Removing it costs +0.035 bpc — nearly half the total compression. The importance drops steeply: h56 and h68 contribute about +0.025 each, and by the 10th neuron (h50) the individual effect is only +0.013.
Each neuron's W_y column tells us which output bytes it promotes (positive weights) and demotes (negative weights).
| Neuron | Δbpc | |W_y| | Promotes | Demotes |
|---|---|---|---|---|
| h8 | +0.0348 | 6.10 | '/' 'i' ' ' 'c' 'd' | 'a' 'e' 'o' 'p' ':' |
| h56 | +0.0250 | 4.43 | ' ' 'y' 's' 'd' 'l' | 'a' 'c' 'n' 'p' 'e' |
| h68 | +0.0245 | 5.83 | 'm' 'i' 'e' 'n' 'p' | ' ' 'k' '>' 'h' '.' |
| h99 | +0.0198 | 5.02 | '>' 'a' 'r' 'd' 'U' | '/' 'i' 'e' 't' ' ' |
| h15 | +0.0196 | 5.19 | 's' '/' 'g' 'l' '"' | 'k' ' ' 'h' 't' '.' |
| h52 | +0.0175 | 5.83 | 'e' 'i' 'a' '.' 'p' | 'k' '<' '/' 'n' '"' |
| h76 | +0.0172 | 4.56 | 's' 'l' 'm' 'd' 'c' | ' ' 'e' 'n' '<' '/' |
| h90 | +0.0154 | 4.69 | 'a' 'k' '"' 'h' '=' | 'M' 'm' '.' 'p' 's' |
| h73 | +0.0149 | 4.32 | 'a' '"' ' ' 'n' '>' | 'r' 't' 'k' 'M' 'l' |
| h50 | +0.0127 | 3.29 | 'e' '"' 'k' '<' '0' | 'a' 's' ' ' 'c' '>' |
What happens when we remove neurons one by one, in order of importance? The BPC increases smoothly, revealing which neurons carry the model's compression ability.
Removing the top 10 neurons (h8, h56, h68, h99, h15, h52, h76, h90, h73, h50) raises BPC from 0.08 to 0.93 — destroying most of the model's predictive power. After ~30 neurons, the model is barely compressing at all.
Now the surprise. Instead of removing neurons, we keep only the top k neurons and zero out all others. How many neurons do we need to match (or beat) the full model?
The compression gap is the difference between marginal (no model) and the full model's BPC. At k=15 neurons, we capture 77.2% of the gap. At k=30, we capture 92.1%. The curve saturates near k=80-100 and actually dips slightly at k=120 vs k=128 — the full model is suboptimal.
| Neurons Kept | BPC | % Gap |
|---|---|---|
| 1 (h8 only) | 5.928 | 26.2% |
| 5 | 4.030 | 50.1% |
| 10 | 2.845 | 65.1% |
| 15 | 1.886 | 77.2% |
| 20 | 1.412 | 83.2% |
| 30 | 0.707 | 92.1% |
| 50 | 0.284 | 97.4% |
| 80 | 0.115 | 99.5% |
| 120 | 0.079 | 100.0% |
| 128 (full) | 0.079 | 100.0% |
k=120 matches the full model exactly. The last 8 neurons contribute exactly zero to compression.
Beyond pruning neurons, we can also prune W_h (recurrent) weights. The combined effect is striking: the best pruned model uses only 26k of 82k parameters and achieves better BPC than the full model.
| Configuration | W_y kept | W_h kept | BPC | Total Params |
|---|---|---|---|---|
| Full model | 100% | 100% | 0.0792 | 82,304 |
| Top 30 neurons | 23.4% | 100% | 0.707 | ~57k |
| Top 20 neurons | 15.6% | 100% | 1.412 | ~38k |
| Top 15 neurons | 11.7% | 100% | 1.886 | ~37k |
| Redux (k=15, W_h pruned) | 11.7% | 0.1% | 4.868 | ~37k |
W_h entries are tiny: median 0.047, 90th percentile 0.186, max 1.393. But these small values are NOT negligible — they determine the Boolean transition function via their signs and the cumulative effect of many small contributions.
Papers: q234-results.pdf • synthesis.pdf • narrative.pdf
Programs: q3_neurons.c • q5_redux.c • q3_decode_neurons.c • q3_neuron_roles.c
Related experiments: Boolean Automaton • Saturation Dynamics • Offset Analysis • Per-Prediction Justifications