Hutter Prize UM Research Timeline

January 31 – February 18, 2026 · 21 archive dates · Claude and MJC

~100
Papers
~150
Experiments
~50
Viewers
1.588
Best bpc
Main trunk
Breakthrough
Dead end
Dropped thread
Double work
Phase I: The Sat-RNN (Jan 31 – Feb 9)
31
Jan 31 Six sessions: discovery, self-correction, unification 5.69 bpc
paper Doubled-E isomorphism: tanh(x) = 2σ(2x)−1 maps RNN exactly to UM. 0.000% bpc difference.
dead 53%/59% compression claim: false baseline comparison (barely-trained models). Corrected to ≤15.9% same day.
paper Q = λ unification: quotient IS luck. Unifies Bayes, thermo, quotient layers, AC, factor maps.
paper Pattern injection via SVD: ~1 bit/char head start writing bigram stats into RNN weights. 4.47
dead P2 prediction (spectral radius ~1): measured 2.52. tanh provides stability, not eigenvalues.
drop Memory depth prediction (d_max = 24/H ≈ 12): not confirmed (flat to k=30). Never revisited.
paper 6 theory papers: memory traces, embeddings, ω-infinity (sky-hook), perfect hashing, factor maps, pattern injection.
viz 26 HTML pages: AC/RNN trace, spectral analysis, Bayes tables, pattern rings. Interactive tooling established.
Feb 1 SN Visibility: Tock 1 infrastructure
infra 768 events, 302 significant patterns, SN viewer, dashboard. Infrastructure for all subsequent work.
Feb 4 Path to Lexicon
result ES1 (h2): word boundary detector, 99.6% accuracy. ES2 (h35): syllable/position momentum.
dead Word identity encoding: 4.9–6.4% accuracy (near random). Words NOT explicitly represented.
result Coverage gap: 31% events touched, <5% bpc explained. ~18 natural ESs from neuron clustering.
6
Feb 6 Synthesis + Saturation experiment 0.079 bpc
paper Synthesis paper (11pp): reviews Jan 31–Feb 4. First of three retellings of the same arc.
tool SIMD optimization: 13.3× speedup (99h → 7.4h for forward pass).
result Sat-RNN: 0.079 bpc on 1024 bytes (4000 epochs). N-gram UM matches in one pass (0.081 at order 11). 0.079
7
Feb 7 Export gap, pattern chains, skip-patterns 0.043 bpc
dead SN export (8-bit quantization of W_h): chaotic, 0.09–2.1 bpc. Recurrent error amplification.
result Pattern-chain UM surpasses sat-rnn at order 10 (0.076 vs 0.079). Order 12: 0.067 bpc, 6180 patterns. 0.067
result Skip-4 [1,8,20,3]: 0.069 bpc with 712 patterns (9× sparser than contiguous). Skip-8: 0.043 bpc. 0.043
tool Backward trie: discovers skip-patterns by MI per offset. Offset 8 chosen before 2 (complementary MI).
result Write-back construction: MLP-256 readout 0.137 bpc without BPTT. Generalizes better (5.43 vs 8.22 test).
9
Feb 8–9 Factor map + sparse differentiation 0.107 bpc (reverse iso)
result Factor map: every neuron is a 2-offset conjunction detector. Mean R²=0.837, 120/128 ≥ 0.80.
result 2-offset + word_len + in_tag = 92.5% of RNN's gain (0.43 bpc vs actual 0.08).
result Reverse isomorphism: log-prob features + W_y only = 0.107 bpc (99.4% of trained quality, no W_x/W_h). 0.107
dead Jacobian traces: chaotic amplification gives wrong attribution (0/128 neurons show offset 1 in gradient top-2).
dead Factor map write-in/subtract-out: all catastrophic (+7.3 bpc step-by-step, +5.6 oracle, +5.9 weight-level).
paper One-way property: φ: H → features is readable not writable. The RNN is ONE dynamical system.
tool Sparse diff: backward gradient GROWS (2.4× at k=8), explaining skip-8 offset selection.
drop Sparse diff viewer v5: five iterations built, then partially lost in file deletion incident.
Phase II: Total Interpretation & Weight Construction (Feb 10–12)
Feb 10 Narrative + Event arithmetic
paper Narrative paper (25pp): 11-phase retelling of Jan 31–Feb 9. Second of three retellings.
paper Event arithmetic: E → N via prime powers. Pattern matching = divisibility. Compression = factoring.
drop 8 prime encoding experiments: demonstrate sparsity but no compression gain. Used later in ring structure.
viz Sparse diff viewer v8: oscilloscope, attribution arcs, lambda=8 folding, SN view, pattern derivation.
11
Feb 11 Total interpretation + Weight construction (LARGEST archive) 0.59/0.40 bpc
result All 82,304 weights from data stats: shift-register W_x/W_h, analytic W_y. 1.89 bpc (zero optim). 1.89
result Optimized W_y: 0.59 bpc (all), 0.40 bpc (test). Both beat trained 4.97. 39,800× cheaper. 0.40
result Boolean automaton: 98.9% sign-determined. Sign-only BETTER by 0.031 bpc. Mantissa is noise.
result Q1–Q7 answers: h28 = 99.7% of gap. 20/128 neurons suffice (+0.15 bpc better). 108 neurons = noise.
paper Narrative paper v3 (6pp): third retelling of the research arc.
paper 12 theory papers: entropy bridge, E-onto-N, microstate/macrostate, ES-isomorphism, quotient chain, temporal bi-embedding, h32, cost analysis.
tool 25 write_weights iterations (write_weights1–25): mostly single-parameter changes.
Feb 11.2 Scaling to enwik9
result R² = 0.83 is architectural: same on 1024-byte model (0.837) and enwik9 model (0.830).
result RNN is Boolean from first checkpoint. Margins grow monotonically: 2.78 (10M) → 61.3 (990M).
dead P1 prediction (margins decrease with scale): wrong, they increase 6.5×.
result Three training phases: learning (10–110M), stable (110–400M), collapse (450–990M). R² cliff at 450M.
12
Feb 12 KN scaling + 20 math papers 1.784 bpc
result KN-6i at 1B: 1.784 bpc. Zero structure, pure counting. Establishes the "counting floor." 1.784
dead Skip offsets at byte level: d > 6 adds zero value. Class-based output: fails at all scales.
dead 256M hash table: OOM-killed. Memory is the bottleneck.
paper ~20 theory papers: counting monad, algebraic semantics, category ES, information geometry, renormalization, expressiveness, fixed-point, forcing-pumping, wants, carrier signal, tropical GCD, Wittgenstein tractatus × 3.
tool 10 more write_weights (26–35) + 9 scale_kn variants. Single-parameter changes.
Phase III: Lexical Injection (Feb 13–15)
Feb 13 Tock protocol design
paper Tock protocol (10pp): systematic word injection procedure. Frequency-ordered, MI-measured. Theory only.
14
Feb 14 31 injection experiments → KN dominance
result Neutrality verified exactly (0 errors / 786K cells). Causal+onset beats oracle (4.57 vs 4.62 bpc).
result KN-6 dominates RNN by 5× (1.24 vs 6.43 bpc at 1M). RNN contributes only 2.4% at 10M.
dead 31 iterations of "mix external predictor with RNN" — the entire premise was wrong.
dead RNN hidden-state centroid onset (+0.085 worse). Entropy-adaptive alpha (zero improvement). Word bigram KN (sparse).
Feb 15 Extended ES theory (correct architecture)
paper Nested model: H_ext = I' × H_inner × O'. Self-similar, conservative. Resolves I/O extension paradox.
paper Tokenization loss: provably loses information (≥0.05 bpc). Strawberry impossibility theorem.
paper P-programs: position counter, letter accumulator, bag-of-letters, graded word support. Concrete SN syntax.
drop P-programs never implemented. Safe-combination conjecture never tested. Abductive learning never built.
Phase IV: Hutter Prize Compression (Feb 16–18)
16
Feb 16 Hutter scoring + ring structure + UM runner 1.588 bpc
result 1.588 bpc = 189.3 MB (1.79× record). KN-6 + sparse ctx (16M HT) + extended match. 1.588
result Sparse contexts: +0.089 bpc. Extended match: +0.005 bpc. Combined: +0.094 bpc total.
result Key: separate HTs prevent contention (v15 shared=bad, v16 separate=good).
dead 8 negative results: KN-8/dual HT, recency, shared HT, momentum, softmax, indirect bigrams, word bigrams, larger HT.
paper KN-quotient v1–v2: KN mapped to integer ring operations. Discount = subtraction in Z.
result GCD empirical: g=1 in 98.3%. D*=0.85 optimal. GCD discount negative (+0.138 bpc worse).
result Exact AC via GMP: zero decode errors (unigram + KN-6 at 1024 bytes).
dead P-program features as KN context: all negative (+0.12–0.18 bpc each). Not coprime to byte context.
tool UM Runner (umr): KN-6 + sparse + match as P-programs. Exact match with hutter_score16.
18
Feb 18 Context events + marginal dominance + multi-frequency
paper Context events conjecture: missing prior = missing context event. Generalized oversupport at any ES.
paper Surprise mechanism: attribution first (startle/settling), not interpretation.
paper Connectome layers: DAG ⇒ topological sort. Cycles require synchronization. Sparse LPP storage.
dead Marginal dominance theorem: max-min with absolute log-counts ⇒ marginal always wins. Pure UM bigram = 5.3 bpc.
result Multi-frequency LPPs: word-onset + tag-onset = +0.184 bpc. Structural LPPs > skip-bigrams.
drop Ring pattern empirical: r=0.047 between raw s(2) and p(2). Nearly uncorrelated. Unclear path forward.
result 91% from-the-left disagreement: KN interpolation = conflict resolution, not just smoothing.
viz 4 viewers: context-events-explainer, um-connectome, um-viewer (SN-loading), um-explorer (neuron-first).
paper Total interpretability gap analysis: found A pattern subset, not THE subset. Sparse diff = proposed path forward.

Summary

Main Trunk (8 steps)

Doubled-E (Jan 31) → Sat-RNN 0.079 (Feb 6) → Pattern chains 0.067 (Feb 7) → Factor map 92.5% (Feb 8–9) → Weight construction 0.40 (Feb 11) → KN floor 1.784 (Feb 12) → Best score 1.588 (Feb 16) → Multi-freq +0.184 (Feb 18)

11 Dead Ends

53% claim • P2 spectral • word identity • factor write-in • P1 margins • 31 injection experiments • oracle absorption • geometric mean • GCD discount • P-program context • marginal dominance

12 Dropped Threads

Memory depth • LSTM forget gate • SVD injection • P-programs impl • viewer features • 256M HT • transformer comparison • safe-combination • abductive learning • interpretability gap • DSS ladder • ~20 math papers

6 Instances of Double Work

3 narrative papers • duplicate summary.tex • 5+ E→N→Q formalizations • 35 write_weights iterations • overlapping factor map papers • 9 scale_kn variants

Critical Blocker

Marginal dominance: the UM max-min forward pass cannot use higher-order patterns. Until solved, the gap from 1.588 to 0.845 bpc (the Hutter Prize record) cannot close through UM machinery alone.

Top 3 Priorities

1. Solve marginal dominance (discount in UM terms)
2. Implement P-programs + integrate multi-frequency LPPs
3. Stop formalizing, start implementing

Previous: Feb 18  ·  Paper (PDF)  ·  Archive Index