Q1: How Sparse Is the Explanation? - Hutter RNN Experiment

What This Experiment Shows

For each of the 1024 predictions the sat-rnn makes, how many of its ~44,794 patterns actually participate in the backward attribution chain? This experiment computes the full backward trace at every position, sweeping the attribution threshold τ across five orders of magnitude.

The answer: typical predictions are extremely sparse, but the distribution has a heavy tail. Most positions need a handful of patterns; a few surprising positions activate thousands.

"The explanation is very sparse for typical predictions but heavy-tailed. The median position at τ = 0.01 uses only 15 patterns out of 44,794. But the mean is 1,166, pulled up by a minority of positions with large attribution counts."

— q1-sparsity.tex

The Numbers at a Glance

44,794

patterns with |w| > 0.01

median active patterns at τ = 0.01

87%

of active patterns are W_h (recurrent)

95.5%

of W_h patterns active somewhere

Sparsity Distribution

As the threshold τ increases, fewer patterns survive. The distribution is highly skewed: mean ≫ median at every threshold, revealing a heavy-tailed distribution where most predictions are cheap but some are expensive.

Threshold τ	Mean n	Median n	Max	n/44,794
10^-4	9,807	10,283	19,850	0.219
10^-3	4,357	1,664	19,352	0.097
10^-2	1,166	15	17,710	0.026
10^-1	157	0	11,012	0.004
1.0	8	0	2,127	0.000

Mean vs Median Active Patterns

The gap between mean and median shows the heavy tail. At τ = 0.01, the mean is 78× the median.

Key finding: At τ = 0.01, the median prediction uses just 15 patterns (0.03% of the total), but the mean is 1,166 — a 78× gap driven by a minority of hard-to-predict positions.

Breakdown by Pattern Class

The model has three classes of patterns from its three weight matrices. W_h (recurrent) patterns dominate at every threshold — the backward attribution chain flows primarily through recurrent connections.

Threshold τ	Mean n_x (input)	Mean n_h (recurrent)	Mean n_y (output)	W_h share
10^-3	481	3,834	42	88%
10^-2	136	1,018	12	87%
10^-1	22	134	2	85%

Pattern Class Composition at τ = 0.01

W_h dominates: Recurrent patterns account for ~87% of all active patterns at every threshold. The backward attribution chain is primarily a story of neuron-to-neuron connections. Output patterns (W_y) are the sparsest — only 12 on average at τ = 0.01, meaning just a handful of neurons contribute meaningfully to each prediction.

Never-Active Patterns

At τ = 0.01, how many of the model's patterns are never active at any of the 1024 positions?

87%

W_x never active (28,635 / 32,768)

4.5%

W_h never active (735 / 16,384)

85%

W_y never active (27,965 / 32,768)

Pattern Utilization by Class

Most W_x patterns are irrelevant because most input byte values never occur in the 1024-byte dataset (only 52 of 256 byte values appear). Similarly, most W_y patterns are irrelevant because most output byte values are never the prediction target.

W_h is fully utilized: Nearly all (95.5%) recurrent connections matter at some position. The 128-neuron recurrent core is not over-provisioned — every connection plays a role.

Depth Profile: The Gradient Does Not Vanish

A common belief about RNNs is that gradients vanish with depth, making long-range dependencies unlearnable. This experiment measures the actual attribution mass at each backward offset d.

Offset d	Mean mass	Fraction of d=0	Visual

Attribution Mass vs Backward Offset

Mass does not decay monotonically. It grows from d=1 to a peak at d≈21, exceeding the d=0 value, then oscillates around 0.5–0.7× d=0 out to d=50.

Peak at d ≈ 20–21: Attribution mass at d=21 (0.827) exceeds d=0 (0.757). The RNN mixes information into a carrier arising from its recurrent dynamics. This is consistent with the greedy MI ordering [1, 8, 20, ...] where offset 20 was selected third.

"The gradient does not vanish. Mass grows from d=1 to a peak at d ≈ 20–21 (exceeding d=0), then oscillates around 0.5–0.7× the d=0 value out to d=50. The RNN mixes information into a carrier arising from its recurrent dynamics."

— q1-sparsity.tex

How It Works

For each position t = 0, ..., 1023, predicting y = x_t+1:

Compute the output gradient g_t ∈ R¹²⁸
For each offset d = 1, ..., D_max, compute the backward gradient g_t,d via the Jacobian chain
For each W_y pattern (j, y): attribution = |[g_t]_j · 1[h_j has correct sign]|
For each W_x pattern (x_t-d, j) at offset d: attribution = |W_x[j, x_t-d] · [g_t,d]_j|
For each W_h pattern (j, k): attribution at offset d = |(1 - h_j(t-d)²) · W_h[k,j] · [g_t,d]_j|; take max over offsets

A pattern is active for position t if its attribution exceeds threshold τ.

Tool: q1_sparsity.c · Model: sat_model.bin from archive/20260209 · Data: first 1024 bytes of enwik9