← Back to Hutter
Archive 2026-02-10
The Research Journey: A Comprehensive Narrative
Papers
- narrative.pdf - The Research Journey: From RNN to Universal Model (comprehensive narrative covering all work from 31 Jan through 10 Feb 2026)
(source)
- event-arithmetic.pdf - Event Arithmetic: E onto N. Prime power encoding of event spaces, projection into N^k, GMP experiment (5 pages)
(source)
- prime-examples.pdf - Exploring the Prime Encoding: 8 experiments on 1024 bytes. GCD, transitions, modular matching, quotient classes, cross-products, information content, residues, alphabet embedding.
(source)
Interactive Visualizations
- viz-dashboard.html — Experiment dashboard: research phase timeline, prime encoding experiments (context sparsity, transition ratios, quotient classes, information content, residues), hypothesis tracking.
- sparse-diff-view.html (v8) — Sparse Differentiation viewer. 128-neuron oscilloscope with backward/forward attribution arcs, λ=8 phase folding, pattern chain overlays, SN file loading, explainer mode. Loads viz_data.json from 20260209.
Summary: This paper chronicles ten days of intensive research into RNN interpretability
through the Universal Model framework. It explains what happened at each stage, what insights were
picked up, what hypotheses were dropped, and how the understanding evolved from initial confusion
to a coherent framework.
Key Themes
Phase 1: The Doubled-E Isomorphism (31 Jan)
RNNs are already Universal Models. The tanh↔softmax equivalence gives exact translation (0.000% bpc diff).
Phase 2: The Tock Methodology (31 Jan)
Tick-tock cycle: train RNN, extract patterns. Refutation of 53% claim. Coverage ≠ explanatory power.
Phase 3: Pattern Injection and Q=λ (31 Jan)
SVD-based UM→RNN gives 1 bit/char head start. Quotient equals luck: Bayesian interpretation of compression.
Phase 4: Lexicon and Word-Level Structure (1-4 Feb)
Word identity refuted (4.9-6.4% accuracy). RNN encodes character transitions, not words. ES1 (word boundary), ES2 (syllable momentum).
Phase 5: Synthesis and Saturation (6 Feb)
Eight days in review. Saturation experiment: 0.079 bpc on 1024 bytes. Memorization simplifies interpretation.
Phase 6: The Export Gap (7 Feb)
Quantization chaos (0.09-2.1 bpc). W_h bottleneck. BPTT-50 artifact. The gap is a signal about missing skip patterns.
Phase 7: Pattern Chains (7-8 Feb)
Direct UM from data surpasses RNN (0.067 vs 0.079 bpc). Backward trie = ground-truth attention map.
Phase 8: Skip-Patterns (8 Feb)
Greedy offset selection. 4 non-contiguous bytes match 12 contiguous with 9× fewer patterns. DSS doubling for artifact detection.
Phase 9: Weight Construction (8 Feb)
RNN weights from data, no BPTT. Better generalization (5.43-5.59 vs 8.22 bpc test). Readout loss: 0.147 bpc.
Phase 10: The Factor Map (9 Feb)
Every neuron is a 2-offset conjunction detector. Mean R²=0.837. 92.5% of RNN's gain captured.
Phase 11: Sparse Diff (9 Feb)
Sensitivity profiles for neuron matching. Solves neuron permutation problem.
What Was Picked Up
- Doubled-E isomorphism (RNNs are UMs)
- Q = λ unification (quotient equals luck)
- Rigor (coverage ≠ explanatory power)
- Pattern injection (UM→RNN via SVD)
- Word boundary ES (99.6% accuracy)
- Saturation (1024-byte memorization)
- Export gap as signal (missing skip patterns)
- Pattern chains (direct UM surpasses RNN)
- Backward trie (ground-truth attention)
- Skip-k-grams (9× pattern compression)
- DSS doubling (artifact detection)
- Write-back construction (better generalization)
- Factor map (2-offset conjunctions + state)
- Neuron permutation symmetry (patterns are real)
- Sparse diff (sensitivity profiles)
What Was Dropped
- 53% compression claim (conflated coverage with explanatory power)
- Spectral radius ≈ 1 (measured 2.52, stability via tanh)
- Word identity encoding (RNN encodes transitions, not words)
- Consonant ES (too diverse to cluster)
- Hope for simple explanations (302 + 4,183 patterns, no shortcuts)
- Integer SN as primary representation (export gap shows float patterns necessary)
Navigation
Next: 20260211 →
Toward total interpretation: backward attribution chains, sat-rnn-redux, six formal questions.
← Previous: 20260209
Sparse diff, factor map, neuron sensitivity profiles.