Q2-Q4: Offsets, Neurons, and Saturation

Three Questions, One Automaton

This experiment answers three of the seven questions from the total-interpretation program. Together they paint a picture of an RNN that uses deep memory, concentrates prediction in a single neuron, and maintains fully volatile Boolean dynamics.

d=25

Q2: dominant perturbation depth

99.7%

Q3: compression from 1 neuron (h28)

3.3

Q4: mean dwell time (steps)

128/128

Q4: all neurons are volatile

Q2: Offsets Q3: Neurons Q4: Saturation

Q2: The RNN Uses Deep Offsets

Method: For each test position and depth d = 1...30, flip the input byte at t−d (XOR with 128), re-run the RNN forward, and measure sign changes + output KL at position t. Average over 13 test positions.

Depth d	Mean sign changes	Mean output KL (bits)	Visual

Depth Perturbation Profile

Deep memory: Input perturbations from 25 steps back cause more sign changes (49.5) and larger prediction shifts (0.809 bits KL) than from 1 step back (8.1 changes, 0.296 bits). The MI-greedy offsets [1, 3, 8, 20] capture only 9.4% of the total sign-change signal.

"The RNN maintains information about inputs 20–30 steps in the past, consistent with BPTT-50 training."

— q234-results.tex

Readout ≠ dynamics: The factor-map found 52/128 neurons dominated by offsets (1,7). But offsets 1 and 7 together account for only 3.3% of the sign-change signal. The factor-map measures readout sensitivity; this experiment measures dynamical sensitivity. The readout depends on recent inputs; the dynamics integrates over long history.

Q3: Which Neurons Carry the Signal?

Method: For each neuron j, zero out the j-th column of W_y and measure the bpc change. This is a "readout knockout" — the neuron still participates in dynamics but cannot contribute to prediction.

Top Neurons by Knockout Importance

Neuron	Δbpc	\|\|W_y\|\| norm	Mean \|h_j\|
h28	+0.030	84.8	0.987
h105	+0.025	30.1	0.985
h54	+0.023	66.6	0.990
h17	+0.021	49.2	0.995
h49	+0.020	55.2	0.991
h10	+0.019	51.6	0.987
h97	+0.018	81.1	0.983
h3	+0.018	2.5	0.991

h3 is important despite tiny W_y: h3 has ||W_y|| = 2.5 — the smallest in the top 8 by a factor of 10. Its importance comes entirely from dynamics: through W_h, it influences other neurons' signs.

Minimal Subset Analysis

Keep only the top-k neurons by knockout importance and zero the rest of W_y:

Compression vs Number of Neurons

Neurons kept k	bpc	% of compression gap	Note
1	4.974	99.7%	h28 alone
6	4.966	100.0%
10	4.948	100.5%
15	4.903	102.0%	Best region
20	4.882	102.7%	Peak performance
30	4.857	103.6%	Still improving
128 (full)	4.965	100.0%	113 neurons add noise

The full model is suboptimal: Keeping 15–30 neurons achieves better bpc than all 128. The remaining 98–113 neurons contribute negative signal through W_y — they add noise to the prediction. No individual neuron's mantissa contributes more than 0.002 bpc.

What Do Top Neurons Predict?

h28

Promotes: q, h, f, r
Demotes: *, #, /, [

Letters vs symbols

h54

Promotes: , , ], e, h
Demotes: f, 9, n

Punctuation context

h97

Promotes: 6, q, 9, e
Demotes: *, #, {

Digits/letters vs symbols

Q4: All Neurons Are Volatile

Method: Track each neuron's sign across all 520 positions. Count sign flips, measure dwell times, identify co-flip pairs.

Most Volatile Neurons

Neuron	Flips	Mean \|h\|	Min \|h\|	% sat	Dwell mode
h54	234	0.990	0.049	92.3%	1
h37	206	0.986	0.089	91.0%	1
h47	201	0.982	0.035	90.2%	1
h110	197	0.990	0.191	93.1%	1
h3	161	0.991	0.004	95.8%	1
h75	153	0.996	0.508	95.6%	4
h40	150	0.996	0.542	97.3%	4

All 128 neurons are volatile: Every neuron flips sign more than 50 times in 520 positions. Zero frozen neurons, zero settled neurons. The least volatile still flips ~100 times. This contradicts the static picture of "112 settled + 16 active" neurons.

"At any single time step, ~123 neurons are saturated (|h| > 0.999) and ~5 are unsaturated. But the identity of the unsaturated neurons changes every step. Over 520 positions, every neuron passes through the unsaturated regime many times."

— q234-results.tex

Dwell Time Distribution

Steps Between Consecutive Sign Flips

Mean dwell time: 3.3 steps. 95% of dwells ≤ 10 steps. The Boolean state is rapidly mixing.

Co-Flip Structure

Neurons that flip at the same position form a co-flip graph. High Jaccard similarity (> 0.5) means the pair flips together more often than apart:

Pair	Co-flips	Jaccard	Individual flips
h17, h109	100	0.510	148, 148
h37, h54	100	0.294	206, 234
h30, h31	98	0.508	144, 147
h40, h46	98	0.476	150, 154
h46, h116	98	0.490	154, 144
h1, h58	96	0.508	141, 144
h30, h34	95	0.487	144, 146
h36, h86	93	0.449	145, 155

Neuron clusters encode shared features:

h30, h31, h34 — a triple with pairwise Jaccard ~0.5
h40, h46, h116 — a triple with pairwise co-flips ~98
h17, h109 — the tightest pair (Jaccard 0.51)
h37, h54, h47, h110, h52 — a loose cluster around h54 (most volatile)

These co-flip groups likely correspond to feature detectors: a context change (e.g., entering/leaving an XML tag) causes a coordinated sign flip across a group of neurons that encode the same feature.

Synthesis

"The sat-rnn is a 128-bit Boolean automaton where: every neuron participates in every prediction (through W_h), only ~15 neurons matter for readout (through W_y), all neurons flip frequently (dwell time ~3 steps), the dynamics propagates information over 20–30 steps, and co-flip groups encode shared contextual features."

— q234-results.tex

Q2–Q4: Offsets, Neurons, and Saturation

Three Questions, One Automaton

Q2: The RNN Uses Deep Offsets

Depth Perturbation Profile

Q3: Which Neurons Carry the Signal?

Top Neurons by Knockout Importance

Minimal Subset Analysis

Compression vs Number of Neurons

What Do Top Neurons Predict?

Q4: All Neurons Are Volatile

Most Volatile Neurons

Dwell Time Distribution

Steps Between Consecutive Sign Flips

Co-Flip Structure

Synthesis