← Back to Hutter
Archive 2026-02-08
The factor map: connecting RNN hidden states to UM pattern inventories.
Papers
- pattern-chains.pdf - Pattern Chains: factor map from RNN neurons to UM skip-patterns (in progress)
(source)
- summary.pdf - Summary: definitions, model taxonomy, results, skeletal argument, open questions
(source)
Interactive Visualizations
- viz-dashboard.html — Experiment dashboard: offset pair distribution, neuron ablation importance, BPC cascade, binary states, reverse isomorphism, greedy offsets, W_x norms, Jacobian growth.
- pattern-viewer.html - All 128 neurons in SN form. Sortable by importance/R²/pair. Activation heatmap over the 1024-byte data. Generalization verdicts.
Analysis Tools
- binary_states.c - Binary hidden state analysis: 984/1024 distinct states, mean |h|=0.655 (not deeply saturated), 31% activations below 0.5, mean 52 neuron flips/step
- factor_map.c - Factor map v1: maps UM skip-patterns onto RNN neurons via MI analysis, ablation importance, sign-based verification
- factor_map2.c - Factor map v2: continuous-value conditional means, R² variance explained, bpc verification
- factor_map3.c - Factor map v3: state-based features (word_len, in_tag, char_class). word_len is best state feature for ALL 128 neurons
- factor_map4.c - Factor map v4: combined 2-offset + state. 2-off=0.85, +word_len=0.50, +in_tag=0.43 bpc (92.5% of gain)
- um_learn.c - UM learning: log-stochastic counting at all offsets 1..50. Greedy-6 = 0.0000 bpc (perfect). Superset of RNN patterns verified.
- rnn_trace.c - Jacobian chain trace: gradient-based influence FAILS (W_h spectral radius >1, chaotic amplification). Statistical factor map is the correct approach.
- trace_example.c - Numerical trace: follows UM patterns through W_x → h → W_h → W_y → output. High-school arithmetic verification.
- reverse_iso.c - Reverse isomorphism v1: hash-based construction. Bigram=2.43, skip-8=0.91, factor-map=1.26 bpc.
- reverse_iso2.c - Reverse isomorphism v2: UM conditional probs. Log-prob ratio = 0.107 bpc (vs trained 0.079). 99.4% of prediction quality without W_x/W_h training.
Goal: Take the sat-rnn (0.079 bpc / 1024 bytes) and find which
UM patterns (n-grams, skip-k-grams) it has learned, how these map onto hidden
neurons, and show the explicit factor map from architecture-natural (128 neurons)
to domain-natural (I→O skip-patterns) factorization.
Key Findings
All 128 neurons explained as 2-offset conjunctions (accuracy 90–97%).
Most common offset pairs: (1,7), (1,8), (8,2), (2,12), (1,20).
The RNN is fundamentally a skip-2-gram machine over specific offset pairs.
Top neurons by ablation importance: h8 (+0.036 bpc), h56 (+0.026), h68 (+0.025) —
confirms previous skip2_rnn analysis. Top 20 neurons account for +0.29 bpc of ablation delta.
Continuous factor map captures 87% of bpc gain: E[h_j | data[t-d1], data[t-d2]] with
best 2-offset pair gives 0.677 bpc (vs 0.081 actual, 4.74 marginal).
Mean R² = 0.837. All 128 neurons have R² ≥ 0.70, 120/128 ≥ 0.80.
Binary sign approximation fails (5.0 bpc) because continuous magnitudes carry most information.
Dominant offset pair (1,7): 52/128 neurons best explained by data[t-1] and data[t-7].
Other common pairs: (1,8): 20, (8,2): 18, (1,12): 9, (2,7): 8.
State features close the gap: 2-offset alone: 0.85 bpc (83%). Adding word_len: 0.50 (91%).
Adding in_tag: 0.43 (92.5%). Matches the UM skip-8 floor (0.043).
word_len is the dominant state feature for ALL 128 neurons.
Remaining 0.35 bpc gap (0.43→0.08) = higher-order patterns + finer state.
UM superset verified: Greedy offset selection [1,8,45,36,48,10] reaches 0.000 bpc
at 6 offsets (909 patterns = perfect prediction on 1024 bytes). Every RNN offset pair
has 400–600 UM patterns. The RNN encodes a fraction of the UM’s inventory.
Gradient-based interpretability fails: Jacobian chain (dh[t]/dx[t-d]) grows with
distance (0.52→1.37 over 30 steps). W_h Frobenius norm = 16.98. 0/128 neurons show
offset 1 in gradient top-2, despite 52 being (1,7) detectors. Chaotic amplification
swamps the signal. Statistical factor map is the correct approach.
W_x input representation: ‘<’ (3.75) and space (3.74) have largest column norms.
Rare chars (u, j, q, z) have near-zero norms (~0.03). The model allocates capacity
proportional to character importance.
Reverse isomorphism achieves 0.107 bpc: Data → UM counting →
log-prob features → optimize W_y only → 0.107 bpc. Within 0.03 of trained
RNN (0.079). Captures 99.4% of total prediction quality. No W_x/W_h training needed.
Random hash encoding gives only 0.91 bpc — the log-likelihood ratio encoding
preserves nearly all of the UM’s learned information.
Navigation
Next: 20260209 →
The real factor map: interpretable patterns onto H dynamics.
← Previous: 20260207
SN visibility, export gap, pattern chains, skip-patterns, weight construction.