← Back to Hutter

Archive 2026-02-09

The real factor map: how interpretable patterns map onto H dynamics. Key result: the factor map is readable but not writable — features are entangled in the dynamics.

Papers

Analysis Tools

Experiment Dashboard

Interactive Visualization (v1-v5)

v6 (lambda oscilloscope) moved to 20260210 archive.

Central result: The factor map φ: H → features is one-way. We can read word length (r=0.58), tag state (r=0.57), and 2-offset conjunctions (R²≥0.80 for 120/128 neurons) from h. But we cannot write them back or subtract them without catastrophic disruption (+7.3 bpc step-by-step vs +0.15 post-hoc). The RNN has one dynamical system, not separable circuits.

Key Findings

Word length is distributed, not per-neuron: Max per-neuron correlation with word length is 0.014. The covariance direction captures r=0.58, R²=0.34. Conditional means E[h|wl=k] show distance from wl=0 growing to 7.9 then plateauing.
Space is a massive reset signal: ||E[h|after_space] - E[h|other]|| = 5.31. W_x space column (norm 3.74) pushes h in the negative word-length direction (cos=-0.15).
Entanglement: features share directions in H. cos(wl_dir, tag_dir) = 0.50 — the two features overlap by 50%. Step-by-step removal of wl_dir costs +7.3 bpc (catastrophic); post-hoc costs +0.15. Even random direction removal costs +2.3 bpc; wl_dir is 3x more important.
The factor map is readable but not writable: φ: H → features works (r=0.58 wl, r=0.57 tag, 92.5% bpc explained). φ⊃{-1}: features → H does not — writing oracle values disrupts dynamics (+5.6 bpc). Weight-level subtraction (removing PCs from W_h) equally catastrophic (+5.9 bpc).
W_h propagation of word-length direction: ||W_h @ v_wl|| = 2.48 (amplification), cos(W_h@v_wl, v_wl) = 0.79 (self-alignment). 92% stays in wl subspace. Exponential growth (||W_h^8 @ v_wl|| = 2831).
After subtraction, wl info partially regenerates: Step-by-step removal of wl_dir leaves residual with r=0.30 for rebuilt direction. W_x reintroduces word-length info at every step (RNN sees spaces).
Backward gradient GROWS (sparse diff): ||g_{t→t-k}|| reaches 2.4x at k=8. Spectral radius > 1 amplifies past signals. Explains why offset 8 is chosen before offset 2 by greedy MI. Mean |attr| grows from 0.041 (k=1) to 0.081 (k=5-6).
Gradient is moderately sparse: Top-10 neurons capture 68% of gradient energy. Same neurons (h68, h76, h99, h8) dominate across all positions — matching factor map importance ranking.

Navigation

← Previous: 20260208
Factor map from RNN neurons to UM skip-patterns. Pattern viewer. Reverse isomorphism.
Next: 20260210 →
Comprehensive narrative: the research journey from RNN to Universal Model.