Research Timeline: Jan 31

31

Jan 31 Six sessions: discovery, self-correction, unification 5.69 bpc

paper Doubled-E isomorphism: tanh(x) = 2σ(2x)−1 maps RNN exactly to UM. 0.000% bpc difference.

dead 53%/59% compression claim: false baseline comparison (barely-trained models). Corrected to ≤15.9% same day.

paper Q = λ unification: quotient IS luck. Unifies Bayes, thermo, quotient layers, AC, factor maps.

paper Pattern injection via SVD: ~1 bit/char head start writing bigram stats into RNN weights. 4.47

dead P2 prediction (spectral radius ~1): measured 2.52. tanh provides stability, not eigenvalues.

drop Memory depth prediction (d_max = 24/H ≈ 12): not confirmed (flat to k=30). Never revisited.

paper 6 theory papers: memory traces, embeddings, ω-infinity (sky-hook), perfect hashing, factor maps, pattern injection.

viz 26 HTML pages: AC/RNN trace, spectral analysis, Bayes tables, pattern rings. Interactive tooling established.

Feb 1 SN Visibility: Tock 1 infrastructure

infra 768 events, 302 significant patterns, SN viewer, dashboard. Infrastructure for all subsequent work.

Feb 4 Path to Lexicon

result ES1 (h2): word boundary detector, 99.6% accuracy. ES2 (h35): syllable/position momentum.

dead Word identity encoding: 4.9–6.4% accuracy (near random). Words NOT explicitly represented.

result Coverage gap: 31% events touched, <5% bpc explained. ~18 natural ESs from neuron clustering.

6

Feb 6 Synthesis + Saturation experiment 0.079 bpc

paper Synthesis paper (11pp): reviews Jan 31–Feb 4. First of three retellings of the same arc.

tool SIMD optimization: 13.3× speedup (99h → 7.4h for forward pass).

result Sat-RNN: 0.079 bpc on 1024 bytes (4000 epochs). N-gram UM matches in one pass (0.081 at order 11). 0.079

7

Feb 7 Export gap, pattern chains, skip-patterns 0.043 bpc

dead SN export (8-bit quantization of W_h): chaotic, 0.09–2.1 bpc. Recurrent error amplification.

result Pattern-chain UM surpasses sat-rnn at order 10 (0.076 vs 0.079). Order 12: 0.067 bpc, 6180 patterns. 0.067

result Skip-4 [1,8,20,3]: 0.069 bpc with 712 patterns (9× sparser than contiguous). Skip-8: 0.043 bpc. 0.043

tool Backward trie: discovers skip-patterns by MI per offset. Offset 8 chosen before 2 (complementary MI).

result Write-back construction: MLP-256 readout 0.137 bpc without BPTT. Generalizes better (5.43 vs 8.22 test).

9

Feb 8–9 Factor map + sparse differentiation 0.107 bpc (reverse iso)

result Factor map: every neuron is a 2-offset conjunction detector. Mean R²=0.837, 120/128 ≥ 0.80.

result 2-offset + word_len + in_tag = 92.5% of RNN's gain (0.43 bpc vs actual 0.08).

result Reverse isomorphism: log-prob features + W_y only = 0.107 bpc (99.4% of trained quality, no W_x/W_h). 0.107

dead Jacobian traces: chaotic amplification gives wrong attribution (0/128 neurons show offset 1 in gradient top-2).

dead Factor map write-in/subtract-out: all catastrophic (+7.3 bpc step-by-step, +5.6 oracle, +5.9 weight-level).

paper One-way property: φ: H → features is readable not writable. The RNN is ONE dynamical system.

tool Sparse diff: backward gradient GROWS (2.4× at k=8), explaining skip-8 offset selection.

drop Sparse diff viewer v5: five iterations built, then partially lost in file deletion incident.

Feb 10 Narrative + Event arithmetic

paper Narrative paper (25pp): 11-phase retelling of Jan 31–Feb 9. Second of three retellings.

paper Event arithmetic: E → N via prime powers. Pattern matching = divisibility. Compression = factoring.

drop 8 prime encoding experiments: demonstrate sparsity but no compression gain. Used later in ring structure.

viz Sparse diff viewer v8: oscilloscope, attribution arcs, lambda=8 folding, SN view, pattern derivation.

11

Feb 11 Total interpretation + Weight construction (LARGEST archive) 0.59/0.40 bpc

result All 82,304 weights from data stats: shift-register W_x/W_h, analytic W_y. 1.89 bpc (zero optim). 1.89

result Optimized W_y: 0.59 bpc (all), 0.40 bpc (test). Both beat trained 4.97. 39,800× cheaper. 0.40

result Boolean automaton: 98.9% sign-determined. Sign-only BETTER by 0.031 bpc. Mantissa is noise.

result Q1–Q7 answers: h28 = 99.7% of gap. 20/128 neurons suffice (+0.15 bpc better). 108 neurons = noise.

paper Narrative paper v3 (6pp): third retelling of the research arc.

paper 12 theory papers: entropy bridge, E-onto-N, microstate/macrostate, ES-isomorphism, quotient chain, temporal bi-embedding, h32, cost analysis.

tool 25 write_weights iterations (write_weights1–25): mostly single-parameter changes.

Feb 11.2 Scaling to enwik9

result R² = 0.83 is architectural: same on 1024-byte model (0.837) and enwik9 model (0.830).

result RNN is Boolean from first checkpoint. Margins grow monotonically: 2.78 (10M) → 61.3 (990M).

dead P1 prediction (margins decrease with scale): wrong, they increase 6.5×.

result Three training phases: learning (10–110M), stable (110–400M), collapse (450–990M). R² cliff at 450M.

12

Feb 12 KN scaling + 20 math papers 1.784 bpc

result KN-6i at 1B: 1.784 bpc. Zero structure, pure counting. Establishes the "counting floor." 1.784

dead Skip offsets at byte level: d > 6 adds zero value. Class-based output: fails at all scales.

dead 256M hash table: OOM-killed. Memory is the bottleneck.

paper ~20 theory papers: counting monad, algebraic semantics, category ES, information geometry, renormalization, expressiveness, fixed-point, forcing-pumping, wants, carrier signal, tropical GCD, Wittgenstein tractatus × 3.

tool 10 more write_weights (26–35) + 9 scale_kn variants. Single-parameter changes.

Feb 13 Tock protocol design

paper Tock protocol (10pp): systematic word injection procedure. Frequency-ordered, MI-measured. Theory only.

14

Feb 14 31 injection experiments → KN dominance

result Neutrality verified exactly (0 errors / 786K cells). Causal+onset beats oracle (4.57 vs 4.62 bpc).

result KN-6 dominates RNN by 5× (1.24 vs 6.43 bpc at 1M). RNN contributes only 2.4% at 10M.

dead 31 iterations of "mix external predictor with RNN" — the entire premise was wrong.

dead RNN hidden-state centroid onset (+0.085 worse). Entropy-adaptive alpha (zero improvement). Word bigram KN (sparse).

Feb 15 Extended ES theory (correct architecture)

paper Nested model: H_ext = I' × H_inner × O'. Self-similar, conservative. Resolves I/O extension paradox.

paper Tokenization loss: provably loses information (≥0.05 bpc). Strawberry impossibility theorem.

paper P-programs: position counter, letter accumulator, bag-of-letters, graded word support. Concrete SN syntax.

drop P-programs never implemented. Safe-combination conjecture never tested. Abductive learning never built.

16

Feb 16 Hutter scoring + ring structure + UM runner 1.588 bpc

result 1.588 bpc = 189.3 MB (1.79× record). KN-6 + sparse ctx (16M HT) + extended match. 1.588

result Sparse contexts: +0.089 bpc. Extended match: +0.005 bpc. Combined: +0.094 bpc total.

result Key: separate HTs prevent contention (v15 shared=bad, v16 separate=good).

dead 8 negative results: KN-8/dual HT, recency, shared HT, momentum, softmax, indirect bigrams, word bigrams, larger HT.

paper KN-quotient v1–v2: KN mapped to integer ring operations. Discount = subtraction in Z.

result GCD empirical: g=1 in 98.3%. D*=0.85 optimal. GCD discount negative (+0.138 bpc worse).

result Exact AC via GMP: zero decode errors (unigram + KN-6 at 1024 bytes).

dead P-program features as KN context: all negative (+0.12–0.18 bpc each). Not coprime to byte context.

tool UM Runner (umr): KN-6 + sparse + match as P-programs. Exact match with hutter_score16.

18

Feb 18 Context events + marginal dominance + multi-frequency

paper Context events conjecture: missing prior = missing context event. Generalized oversupport at any ES.

paper Surprise mechanism: attribution first (startle/settling), not interpretation.

paper Connectome layers: DAG ⇒ topological sort. Cycles require synchronization. Sparse LPP storage.

dead Marginal dominance theorem: max-min with absolute log-counts ⇒ marginal always wins. Pure UM bigram = 5.3 bpc.

result Multi-frequency LPPs: word-onset + tag-onset = +0.184 bpc. Structural LPPs > skip-bigrams.

drop Ring pattern empirical: r=0.047 between raw s(2) and p(2). Nearly uncorrelated. Unclear path forward.

result 91% from-the-left disagreement: KN interpolation = conflict resolution, not just smoothing.

viz 4 viewers: context-events-explainer, um-connectome, um-viewer (SN-loading), um-explorer (neuron-first).

paper Total interpretability gap analysis: found A pattern subset, not THE subset. Sparse diff = proposed path forward.

Hutter Prize UM Research Timeline

Summary

Main Trunk (8 steps)

11 Dead Ends

12 Dropped Threads

6 Instances of Double Work

Critical Blocker

Top 3 Priorities