← Back to Hutter

Archive 2026-02-11

Total Interpretation & Weight Construction for a Small RNN

Papers

Goal: Complete interpretability of the sat-rnn (128 hidden, 0.079 bpc), then write the weights back from the interpretation. Close the loop: data → UM → RNN → interpretation → data.

Key Results: Writing the Weights In

Hebbian covariance predicts W_h at r = 0.56
cov(h_j(t), h_i(t+1)) correlates with trained W_h at r = 0.40 (all), r = 0.56 (important entries |w| ≥ 3.0, R² = 31%). Sign prediction accuracy: 72.7% for important weights. Optimal scale factor: 3.94.
Replacing b_h costs zero (improves by 0.011 bpc)
Sign log-odds of the positive fraction perfectly reconstructs the bias. Trained + Hebbian b_h: 4.954 bpc vs trained: 4.965 bpc.
50% Hebbian W_y blend IMPROVES the model by 0.66 bpc
Mixing Hebbian W_y with trained W_y: 4.307 bpc (vs trained 4.965). The trained readout is over-optimized for mantissa dynamics; the Hebbian correction pushes toward the true conditional distribution.
Constructed dynamics + optimized readout: 2.80–3.96 bpc
All Wx,Wh,bh from data covariance, only W_y optimized: 3.96 bpc. Sign-conditioned dynamics + optimized W_y: 2.80 bpc. (Both overfit to 520 bytes.)
FULLY ANALYTIC: 1.89 bpc with ZERO optimization (beats trained 4.97 by 3.08)
Shift-register dynamics + analytic W_y from skip-bigram log-ratios. ALL 82k parameters from data statistics. Zero gradient descent, zero BPTT. Generalizes comparably: test 4.88 vs trained 5.08 (within 0.2 bpc). The loop is closed.
Optimized readout: 0.59 bpc all data, 0.40 bpc test
Shift-register (16 groups of 8, hash encoding) + gradient-optimized W_y. No trained model needed for dynamics. Perfect 16-step memory vs trained model's chaotic info destruction.
Sparse 26k-param architecture cannot train from scratch
Redux architecture (20 neurons, sparse W_h) from random init: 7.74 bpc after 50 epochs. Full 82k architecture: 5.16 bpc. The dense W_h is scaffolding for gradient flow.

Key Results: Boolean Automaton (Q1–Q7)

The mantissa is noise, not memory
Sign-only dynamics: 5.690 bpc — BETTER than full f32 (5.721). Zero-mantissa: 5.582 bpc. Sign carries 99.7% of compression. 31.6 sign changes/step. Mantissa degrades by 0.095 bpc.
Margins prove the Boolean function IS the computation
Mean margin: 60.5. Max mantissa perturbation: 4.7×10-5. 98.9% of neuron-steps have |z| > 1.0. Safety factor 106×.
1 neuron = 99.7%, top 15 = 102% (Q3)
h28 alone captures 99.7% of compression gap. Top 15 beat full model. 113 neurons are noise for readout. 20 neurons + 36% W_h = 4.81 bpc (0.15 better than full).
All 128 volatile, deep offsets d=18-25 (Q2, Q4)
Zero frozen neurons. Mean dwell: 3.3 steps. Co-flip pairs (Jaccard > 0.5). MI-greedy captures 9.4%. 23% of neurons dominated by d=25.
Routing backbone: h54 ← h121 ← h78 (Q6)
Each prediction: ~15 weights (0.1% of W_h). h54 dominates 7/12 predictions (smallest margin 26.7, most volatile 234 flips).
74% RNN-PMI alignment (Q7)
Shallow offsets (d=1-4): 88%. Deep offsets (d>15): 24-37%. The RNN develops higher-order patterns at depth.

Key Results: f32/GMP Experiments (Q1)

Bit leverage 300:52:1, gradient decorrelates at d=1
Per-bit KL: sign 0.046, exponent 0.008/bit, mantissa 0.00015/bit. Mantissa bits 0-14 have zero effect. Pattern ranking ρ = 1.000 at d ≥ 11. Error ratio 3.44 (Lyapunov).

Interactive Experiment Viewers

Experimental Tools

Navigation

← Previous: 20260210
Narrative paper, event arithmetic, prime encoding, viewer v8.
Next: 20260211_2 →
Scaling to full enwik9: experimental design, seven predictions.