← Back to Hutter

Archive 2026-02-09

The real factor map: how interpretable patterns map onto H dynamics. Key result: the factor map is readable but not writable — features are entangled in the dynamics.

Papers

factor-map.pdf - The Factor Map: entangled dynamics, one-way interpretability (7 pages) (source)
sparse-diff.pdf - Sparse Differentiation: per-prediction gradient attribution, backward gradient grows 2.4x (5 pages) (source)

Analysis Tools

factor_map2.c - Per-neuron correlation (max 0.014), space reset (||diff||=5.31), SVD of W_h (top 10), post-hoc subtraction sweep (+0.10 bpc at 100%)
factor_map3.c - Conditional means E[h|wl=k], PCA of wl subspace (3 PCs, 38/33/29%), W_h propagation (2.5x amplification, cos=0.79), weight-level subtraction (+5.9 bpc)
factor_map4.c - Step-by-step interventions: subtract wl (+7.3 bpc), write-in oracle (+5.6 bpc), random control (+2.3 bpc), residual wl info (r=0.30 regenerates)
sparse_diff.c - Gradient attribution: sparsity (top-10=68%), backward growth (2.4x at k=8), per-position attribution, offset attribution table
sparse_diff_export.c - Export JSON data for visualization (hidden states, gradients, backward attribution, active patterns)

Experiment Dashboard

viz-dashboard.html — Interactive charts: per-neuron R², intervention effects, backward gradient growth, word length encoding, feature entanglement, W_h propagation, gradient sparsity, BPC breakdown.

Interactive Visualization (v1-v5)

v5 - Electric blue neuron-targeted arcs, diagonal cascade text, deselect button, page-flip scroll
v4 - Rotated -55° diagonal letters (funny but impractical)
v3 - Neuron-targeted backward flow, h/l vim keys
v2 - Letters in oscilloscope, nav buttons, toast
v1 - Original with sidebar Sankey

v6 (lambda oscilloscope) moved to 20260210 archive.

Central result: The factor map φ: H → features is one-way. We can read word length (r=0.58), tag state (r=0.57), and 2-offset conjunctions (R²≥0.80 for 120/128 neurons) from h. But we cannot write them back or subtract them without catastrophic disruption (+7.3 bpc step-by-step vs +0.15 post-hoc). The RNN has one dynamical system, not separable circuits.

Key Findings

Word length is distributed, not per-neuron: Max per-neuron correlation with word length is 0.014. The covariance direction captures r=0.58, R²=0.34. Conditional means E[h|wl=k] show distance from wl=0 growing to 7.9 then plateauing.

Space is a massive reset signal: ||E[h|after_space] - E[h|other]|| = 5.31. W_x space column (norm 3.74) pushes h in the negative word-length direction (cos=-0.15).

Entanglement: features share directions in H. cos(wl_dir, tag_dir) = 0.50 — the two features overlap by 50%. Step-by-step removal of wl_dir costs +7.3 bpc (catastrophic); post-hoc costs +0.15. Even random direction removal costs +2.3 bpc; wl_dir is 3x more important.

The factor map is readable but not writable: φ: H → features works (r=0.58 wl, r=0.57 tag, 92.5% bpc explained). φ⊃{-1}: features → H does not — writing oracle values disrupts dynamics (+5.6 bpc). Weight-level subtraction (removing PCs from W_h) equally catastrophic (+5.9 bpc).

W_h propagation of word-length direction: ||W_h @ v_wl|| = 2.48 (amplification), cos(W_h@v_wl, v_wl) = 0.79 (self-alignment). 92% stays in wl subspace. Exponential growth (||W_h^8 @ v_wl|| = 2831).

After subtraction, wl info partially regenerates: Step-by-step removal of wl_dir leaves residual with r=0.30 for rebuilt direction. W_x reintroduces word-length info at every step (RNN sees spaces).

Backward gradient GROWS (sparse diff): ||g_{t→t-k}|| reaches 2.4x at k=8. Spectral radius > 1 amplifies past signals. Explains why offset 8 is chosen before offset 2 by greedy MI. Mean |attr| grows from 0.041 (k=1) to 0.081 (k=5-6).

Gradient is moderately sparse: Top-10 neurons capture 68% of gradient energy. Same neurons (h68, h76, h99, h8) dominate across all positions — matching factor map importance ranking.

Navigation

← Previous: 20260208

Factor map from RNN neurons to UM skip-patterns. Pattern viewer. Reverse isomorphism.

Next: 20260210 →

Comprehensive narrative: the research journey from RNN to Universal Model.