Archive 2026-02-08

The factor map: connecting RNN hidden states to UM pattern inventories.

Papers

pattern-chains.pdf - Pattern Chains: factor map from RNN neurons to UM skip-patterns (in progress) (source)
summary.pdf - Summary: definitions, model taxonomy, results, skeletal argument, open questions (source)

Interactive Visualizations

viz-dashboard.html — Experiment dashboard: offset pair distribution, neuron ablation importance, BPC cascade, binary states, reverse isomorphism, greedy offsets, W_x norms, Jacobian growth.
pattern-viewer.html - All 128 neurons in SN form. Sortable by importance/R²/pair. Activation heatmap over the 1024-byte data. Generalization verdicts.

Analysis Tools

binary_states.c - Binary hidden state analysis: 984/1024 distinct states, mean |h|=0.655 (not deeply saturated), 31% activations below 0.5, mean 52 neuron flips/step
factor_map.c - Factor map v1: maps UM skip-patterns onto RNN neurons via MI analysis, ablation importance, sign-based verification
factor_map2.c - Factor map v2: continuous-value conditional means, R² variance explained, bpc verification
factor_map3.c - Factor map v3: state-based features (word_len, in_tag, char_class). word_len is best state feature for ALL 128 neurons
factor_map4.c - Factor map v4: combined 2-offset + state. 2-off=0.85, +word_len=0.50, +in_tag=0.43 bpc (92.5% of gain)
um_learn.c - UM learning: log-stochastic counting at all offsets 1..50. Greedy-6 = 0.0000 bpc (perfect). Superset of RNN patterns verified.
rnn_trace.c - Jacobian chain trace: gradient-based influence FAILS (W_h spectral radius >1, chaotic amplification). Statistical factor map is the correct approach.
trace_example.c - Numerical trace: follows UM patterns through W_x → h → W_h → W_y → output. High-school arithmetic verification.
reverse_iso.c - Reverse isomorphism v1: hash-based construction. Bigram=2.43, skip-8=0.91, factor-map=1.26 bpc.
reverse_iso2.c - Reverse isomorphism v2: UM conditional probs. Log-prob ratio = 0.107 bpc (vs trained 0.079). 99.4% of prediction quality without W_x/W_h training.

Goal: Take the sat-rnn (0.079 bpc / 1024 bytes) and find which UM patterns (n-grams, skip-k-grams) it has learned, how these map onto hidden neurons, and show the explicit factor map from architecture-natural (128 neurons) to domain-natural (I→O skip-patterns) factorization.

Key Findings

All 128 neurons explained as 2-offset conjunctions (accuracy 90–97%). Most common offset pairs: (1,7), (1,8), (8,2), (2,12), (1,20). The RNN is fundamentally a skip-2-gram machine over specific offset pairs.

Top neurons by ablation importance: h8 (+0.036 bpc), h56 (+0.026), h68 (+0.025) — confirms previous skip2_rnn analysis. Top 20 neurons account for +0.29 bpc of ablation delta.

Continuous factor map captures 87% of bpc gain: E[h_j | data[t-d1], data[t-d2]] with best 2-offset pair gives 0.677 bpc (vs 0.081 actual, 4.74 marginal). Mean R² = 0.837. All 128 neurons have R² ≥ 0.70, 120/128 ≥ 0.80. Binary sign approximation fails (5.0 bpc) because continuous magnitudes carry most information.

Dominant offset pair (1,7): 52/128 neurons best explained by data[t-1] and data[t-7]. Other common pairs: (1,8): 20, (8,2): 18, (1,12): 9, (2,7): 8.

State features close the gap: 2-offset alone: 0.85 bpc (83%). Adding word_len: 0.50 (91%). Adding in_tag: 0.43 (92.5%). Matches the UM skip-8 floor (0.043). word_len is the dominant state feature for ALL 128 neurons. Remaining 0.35 bpc gap (0.43→0.08) = higher-order patterns + finer state.

UM superset verified: Greedy offset selection [1,8,45,36,48,10] reaches 0.000 bpc at 6 offsets (909 patterns = perfect prediction on 1024 bytes). Every RNN offset pair has 400–600 UM patterns. The RNN encodes a fraction of the UM’s inventory.

Gradient-based interpretability fails: Jacobian chain (dh[t]/dx[t-d]) grows with distance (0.52→1.37 over 30 steps). W_h Frobenius norm = 16.98. 0/128 neurons show offset 1 in gradient top-2, despite 52 being (1,7) detectors. Chaotic amplification swamps the signal. Statistical factor map is the correct approach.

W_x input representation: ‘<’ (3.75) and space (3.74) have largest column norms. Rare chars (u, j, q, z) have near-zero norms (~0.03). The model allocates capacity proportional to character importance.

Reverse isomorphism achieves 0.107 bpc: Data → UM counting → log-prob features → optimize W_y only → 0.107 bpc. Within 0.03 of trained RNN (0.079). Captures 99.4% of total prediction quality. No W_x/W_h training needed. Random hash encoding gives only 0.91 bpc — the log-likelihood ratio encoding preserves nearly all of the UM’s learned information.

Navigation

Next: 20260209 →

The real factor map: interpretable patterns onto H dynamics.

← Previous: 20260207

SN visibility, export gap, pattern chains, skip-patterns, weight construction.