Q1: How Sparse Is the Explanation?

Experiment: q1_sparsity — 2026-02-11
"The median position at τ = 0.01 uses only 15 patterns out of 44,794."

What This Experiment Shows

For each of the 1024 predictions the sat-rnn makes, how many of its ~44,794 patterns actually participate in the backward attribution chain? This experiment computes the full backward trace at every position, sweeping the attribution threshold τ across five orders of magnitude.

The answer: typical predictions are extremely sparse, but the distribution has a heavy tail. Most positions need a handful of patterns; a few surprising positions activate thousands.

"The explanation is very sparse for typical predictions but heavy-tailed. The median position at τ = 0.01 uses only 15 patterns out of 44,794. But the mean is 1,166, pulled up by a minority of positions with large attribution counts."
— q1-sparsity.tex

The Numbers at a Glance

44,794
patterns with |w| > 0.01
15
median active patterns at τ = 0.01
87%
of active patterns are Wh (recurrent)
95.5%
of Wh patterns active somewhere

Sparsity Distribution

As the threshold τ increases, fewer patterns survive. The distribution is highly skewed: mean ≫ median at every threshold, revealing a heavy-tailed distribution where most predictions are cheap but some are expensive.

Threshold τMean nMedian nMinMaxn/44,794
10-49,80710,283019,8500.219
10-34,3571,664019,3520.097
10-21,16615017,7100.026
10-11570011,0120.004
1.08002,1270.000

Mean vs Median Active Patterns

The gap between mean and median shows the heavy tail. At τ = 0.01, the mean is 78× the median.

Key finding: At τ = 0.01, the median prediction uses just 15 patterns (0.03% of the total), but the mean is 1,166 — a 78× gap driven by a minority of hard-to-predict positions.

Breakdown by Pattern Class

The model has three classes of patterns from its three weight matrices. Wh (recurrent) patterns dominate at every threshold — the backward attribution chain flows primarily through recurrent connections.

Threshold τMean nx (input)Mean nh (recurrent)Mean ny (output)Wh share
10-34813,83442 88%
10-21361,01812 87%
10-1221342 85%

Pattern Class Composition at τ = 0.01

Wh dominates: Recurrent patterns account for ~87% of all active patterns at every threshold. The backward attribution chain is primarily a story of neuron-to-neuron connections. Output patterns (Wy) are the sparsest — only 12 on average at τ = 0.01, meaning just a handful of neurons contribute meaningfully to each prediction.

Never-Active Patterns

At τ = 0.01, how many of the model's patterns are never active at any of the 1024 positions?

87%
Wx never active (28,635 / 32,768)
4.5%
Wh never active (735 / 16,384)
85%
Wy never active (27,965 / 32,768)

Pattern Utilization by Class

Most Wx patterns are irrelevant because most input byte values never occur in the 1024-byte dataset (only 52 of 256 byte values appear). Similarly, most Wy patterns are irrelevant because most output byte values are never the prediction target.

Wh is fully utilized: Nearly all (95.5%) recurrent connections matter at some position. The 128-neuron recurrent core is not over-provisioned — every connection plays a role.

Depth Profile: The Gradient Does Not Vanish

A common belief about RNNs is that gradients vanish with depth, making long-range dependencies unlearnable. This experiment measures the actual attribution mass at each backward offset d.

Offset dMean massFraction of d=0Visual

Attribution Mass vs Backward Offset

Mass does not decay monotonically. It grows from d=1 to a peak at d≈21, exceeding the d=0 value, then oscillates around 0.5–0.7× d=0 out to d=50.

Peak at d ≈ 20–21: Attribution mass at d=21 (0.827) exceeds d=0 (0.757). The RNN mixes information into a carrier arising from its recurrent dynamics. This is consistent with the greedy MI ordering [1, 8, 20, ...] where offset 20 was selected third.
"The gradient does not vanish. Mass grows from d=1 to a peak at d ≈ 20–21 (exceeding d=0), then oscillates around 0.5–0.7× the d=0 value out to d=50. The RNN mixes information into a carrier arising from its recurrent dynamics."
— q1-sparsity.tex

How It Works

For each position t = 0, ..., 1023, predicting y = xt+1:

  1. Compute the output gradient gt ∈ R128
  2. For each offset d = 1, ..., Dmax, compute the backward gradient gt,d via the Jacobian chain
  3. For each Wy pattern (j, y): attribution = |[gt]j · 1[hj has correct sign]|
  4. For each Wx pattern (xt-d, j) at offset d: attribution = |Wx[j, xt-d] · [gt,d]j|
  5. For each Wh pattern (j, k): attribution at offset d = |(1 - hj(t-d)²) · Wh[k,j] · [gt,d]j|; take max over offsets

A pattern is active for position t if its attribution exceeds threshold τ.

Tool: q1_sparsity.c · Model: sat_model.bin from archive/20260209 · Data: first 1024 bytes of enwik9