← Back to Hutter

Archive 2026-02-08

The factor map: connecting RNN hidden states to UM pattern inventories.

Papers

Interactive Visualizations

Analysis Tools

Goal: Take the sat-rnn (0.079 bpc / 1024 bytes) and find which UM patterns (n-grams, skip-k-grams) it has learned, how these map onto hidden neurons, and show the explicit factor map from architecture-natural (128 neurons) to domain-natural (I→O skip-patterns) factorization.

Key Findings

All 128 neurons explained as 2-offset conjunctions (accuracy 90–97%). Most common offset pairs: (1,7), (1,8), (8,2), (2,12), (1,20). The RNN is fundamentally a skip-2-gram machine over specific offset pairs.
Top neurons by ablation importance: h8 (+0.036 bpc), h56 (+0.026), h68 (+0.025) — confirms previous skip2_rnn analysis. Top 20 neurons account for +0.29 bpc of ablation delta.
Continuous factor map captures 87% of bpc gain: E[h_j | data[t-d1], data[t-d2]] with best 2-offset pair gives 0.677 bpc (vs 0.081 actual, 4.74 marginal). Mean R² = 0.837. All 128 neurons have R² ≥ 0.70, 120/128 ≥ 0.80. Binary sign approximation fails (5.0 bpc) because continuous magnitudes carry most information.
Dominant offset pair (1,7): 52/128 neurons best explained by data[t-1] and data[t-7]. Other common pairs: (1,8): 20, (8,2): 18, (1,12): 9, (2,7): 8.
State features close the gap: 2-offset alone: 0.85 bpc (83%). Adding word_len: 0.50 (91%). Adding in_tag: 0.43 (92.5%). Matches the UM skip-8 floor (0.043). word_len is the dominant state feature for ALL 128 neurons. Remaining 0.35 bpc gap (0.43→0.08) = higher-order patterns + finer state.
UM superset verified: Greedy offset selection [1,8,45,36,48,10] reaches 0.000 bpc at 6 offsets (909 patterns = perfect prediction on 1024 bytes). Every RNN offset pair has 400–600 UM patterns. The RNN encodes a fraction of the UM’s inventory.
Gradient-based interpretability fails: Jacobian chain (dh[t]/dx[t-d]) grows with distance (0.52→1.37 over 30 steps). W_h Frobenius norm = 16.98. 0/128 neurons show offset 1 in gradient top-2, despite 52 being (1,7) detectors. Chaotic amplification swamps the signal. Statistical factor map is the correct approach.
W_x input representation: ‘<’ (3.75) and space (3.74) have largest column norms. Rare chars (u, j, q, z) have near-zero norms (~0.03). The model allocates capacity proportional to character importance.
Reverse isomorphism achieves 0.107 bpc: Data → UM counting → log-prob features → optimize W_y only → 0.107 bpc. Within 0.03 of trained RNN (0.079). Captures 99.4% of total prediction quality. No W_x/W_h training needed. Random hash encoding gives only 0.91 bpc — the log-likelihood ratio encoding preserves nearly all of the UM’s learned information.

Navigation

Next: 20260209 →
The real factor map: interpretable patterns onto H dynamics.
← Previous: 20260207
SN visibility, export gap, pattern chains, skip-patterns, weight construction.