← Back to Hutter
Archive 2026-02-09
The real factor map: how interpretable patterns map onto H dynamics.
Key result: the factor map is readable but not writable —
features are entangled in the dynamics.
Papers
- factor-map.pdf - The Factor Map: entangled dynamics, one-way interpretability (7 pages)
(source)
- sparse-diff.pdf - Sparse Differentiation: per-prediction gradient attribution, backward gradient grows 2.4x (5 pages)
(source)
Analysis Tools
- factor_map2.c - Per-neuron correlation (max 0.014), space reset (||diff||=5.31), SVD of W_h (top 10), post-hoc subtraction sweep (+0.10 bpc at 100%)
- factor_map3.c - Conditional means E[h|wl=k], PCA of wl subspace (3 PCs, 38/33/29%), W_h propagation (2.5x amplification, cos=0.79), weight-level subtraction (+5.9 bpc)
- factor_map4.c - Step-by-step interventions: subtract wl (+7.3 bpc), write-in oracle (+5.6 bpc), random control (+2.3 bpc), residual wl info (r=0.30 regenerates)
- sparse_diff.c - Gradient attribution: sparsity (top-10=68%), backward growth (2.4x at k=8), per-position attribution, offset attribution table
- sparse_diff_export.c - Export JSON data for visualization (hidden states, gradients, backward attribution, active patterns)
Experiment Dashboard
- viz-dashboard.html — Interactive charts: per-neuron R², intervention effects, backward gradient growth, word length encoding, feature entanglement, W_h propagation, gradient sparsity, BPC breakdown.
Interactive Visualization (v1-v5)
- v5 - Electric blue neuron-targeted arcs, diagonal cascade text, deselect button, page-flip scroll
- v4 - Rotated -55° diagonal letters (funny but impractical)
- v3 - Neuron-targeted backward flow, h/l vim keys
- v2 - Letters in oscilloscope, nav buttons, toast
- v1 - Original with sidebar Sankey
v6 (lambda oscilloscope) moved to 20260210 archive.
Central result: The factor map φ: H → features is one-way.
We can read word length (r=0.58), tag state (r=0.57), and 2-offset conjunctions
(R²≥0.80 for 120/128 neurons) from h. But we cannot write them back or
subtract them without catastrophic disruption (+7.3 bpc step-by-step vs +0.15 post-hoc).
The RNN has one dynamical system, not separable circuits.
Key Findings
Word length is distributed, not per-neuron: Max per-neuron correlation
with word length is 0.014. The covariance direction captures r=0.58, R²=0.34.
Conditional means E[h|wl=k] show distance from wl=0 growing to 7.9 then plateauing.
Space is a massive reset signal: ||E[h|after_space] - E[h|other]|| = 5.31.
W_x space column (norm 3.74) pushes h in the negative word-length direction (cos=-0.15).
Entanglement: features share directions in H.
cos(wl_dir, tag_dir) = 0.50 — the two features overlap by 50%.
Step-by-step removal of wl_dir costs +7.3 bpc (catastrophic); post-hoc costs +0.15.
Even random direction removal costs +2.3 bpc; wl_dir is 3x more important.
The factor map is readable but not writable:
φ: H → features works (r=0.58 wl, r=0.57 tag, 92.5% bpc explained).
φ⊃{-1}: features → H does not — writing oracle values disrupts dynamics (+5.6 bpc).
Weight-level subtraction (removing PCs from W_h) equally catastrophic (+5.9 bpc).
W_h propagation of word-length direction:
||W_h @ v_wl|| = 2.48 (amplification), cos(W_h@v_wl, v_wl) = 0.79 (self-alignment).
92% stays in wl subspace. Exponential growth (||W_h^8 @ v_wl|| = 2831).
After subtraction, wl info partially regenerates:
Step-by-step removal of wl_dir leaves residual with r=0.30 for rebuilt direction.
W_x reintroduces word-length info at every step (RNN sees spaces).
Backward gradient GROWS (sparse diff):
||g_{t→t-k}|| reaches 2.4x at k=8. Spectral radius > 1 amplifies past signals.
Explains why offset 8 is chosen before offset 2 by greedy MI.
Mean |attr| grows from 0.041 (k=1) to 0.081 (k=5-6).
Gradient is moderately sparse:
Top-10 neurons capture 68% of gradient energy. Same neurons (h68, h76, h99, h8)
dominate across all positions — matching factor map importance ranking.
Navigation
← Previous: 20260208
Factor map from RNN neurons to UM skip-patterns. Pattern viewer. Reverse isomorphism.
Next: 20260210 →
Comprehensive narrative: the research journey from RNN to Universal Model.