This experiment answers three of the seven questions from the total-interpretation program. Together they paint a picture of an RNN that uses deep memory, concentrates prediction in a single neuron, and maintains fully volatile Boolean dynamics.
Method: For each test position and depth d = 1...30, flip the input byte at t−d (XOR with 128), re-run the RNN forward, and measure sign changes + output KL at position t. Average over 13 test positions.
| Depth d | Mean sign changes | Mean output KL (bits) | Visual |
|---|
Method: For each neuron j, zero out the j-th column of Wy and measure the bpc change. This is a "readout knockout" — the neuron still participates in dynamics but cannot contribute to prediction.
| Neuron | Δbpc | ||Wy|| norm | Mean |hj| | Impact |
|---|---|---|---|---|
| h28 | +0.030 | 84.8 | 0.987 | |
| h105 | +0.025 | 30.1 | 0.985 | |
| h54 | +0.023 | 66.6 | 0.990 | |
| h17 | +0.021 | 49.2 | 0.995 | |
| h49 | +0.020 | 55.2 | 0.991 | |
| h10 | +0.019 | 51.6 | 0.987 | |
| h97 | +0.018 | 81.1 | 0.983 | |
| h3 | +0.018 | 2.5 | 0.991 |
Keep only the top-k neurons by knockout importance and zero the rest of Wy:
| Neurons kept k | bpc | % of compression gap | Note |
|---|---|---|---|
| 1 | 4.974 | 99.7% | h28 alone |
| 6 | 4.966 | 100.0% | |
| 10 | 4.948 | 100.5% | |
| 15 | 4.903 | 102.0% | Best region |
| 20 | 4.882 | 102.7% | Peak performance |
| 30 | 4.857 | 103.6% | Still improving |
| 128 (full) | 4.965 | 100.0% | 113 neurons add noise |
Method: Track each neuron's sign across all 520 positions. Count sign flips, measure dwell times, identify co-flip pairs.
| Neuron | Flips | Mean |h| | Min |h| | % sat | Dwell mode | Volatility |
|---|---|---|---|---|---|---|
| h54 | 234 | 0.990 | 0.049 | 92.3% | 1 | |
| h37 | 206 | 0.986 | 0.089 | 91.0% | 1 | |
| h47 | 201 | 0.982 | 0.035 | 90.2% | 1 | |
| h110 | 197 | 0.990 | 0.191 | 93.1% | 1 | |
| h3 | 161 | 0.991 | 0.004 | 95.8% | 1 | |
| h75 | 153 | 0.996 | 0.508 | 95.6% | 4 | |
| h40 | 150 | 0.996 | 0.542 | 97.3% | 4 |
Mean dwell time: 3.3 steps. 95% of dwells ≤ 10 steps. The Boolean state is rapidly mixing.
Neurons that flip at the same position form a co-flip graph. High Jaccard similarity (> 0.5) means the pair flips together more often than apart:
| Pair | Co-flips | Jaccard | Individual flips | Coupling |
|---|---|---|---|---|
| h17, h109 | 100 | 0.510 | 148, 148 | |
| h37, h54 | 100 | 0.294 | 206, 234 | |
| h30, h31 | 98 | 0.508 | 144, 147 | |
| h40, h46 | 98 | 0.476 | 150, 154 | |
| h46, h116 | 98 | 0.490 | 154, 144 | |
| h1, h58 | 96 | 0.508 | 141, 144 | |
| h30, h34 | 95 | 0.487 | 144, 146 | |
| h36, h86 | 93 | 0.449 | 145, 155 |
These co-flip groups likely correspond to feature detectors: a context change (e.g., entering/leaving an XML tag) causes a coordinated sign flip across a group of neurons that encode the same feature.