Compressing enwik9 via the Universal Model. Based on the CMP paper (Clement, 2026).
Start here. The current research front, plus the foundational paper. See #paper_index for the full catalog (~150 papers).
The Lexicon Embedding — From bytes to words: synthesis, experimental plan, predictions (Latest)
Complete picture: agent model, orthographic bijection, two-level trace (word events + spelling residual), MDL lexicon, factor tower, quotient ring algebra, P-programming, formal tokenization contrast. 100-word results (-0.552 bpc). Experimental plan for full lexicon. 14pp.
2026-03-12
MCP Explainers, Combination Problem, Ablation, and Conviction Analysis: 42 papers, 6 interactive explainers, and versioned LATD snapshots. Key new results: count-augmented LPPs (2.766 bpc at 100K, 3.049 at 1M), oracle-correlation conviction-vs-accuracy tradeoff, conviction-depth fewer-wins-bigger-wins asymmetry, and H3 normalized conviction as leading higher-order hybrid. LATD three-regime decomposition with versioned snapshots and prior-art lineage back to Feb trace viewers. Plus quotient/fiber, combination problem, blend analysis, and response-cluster papers.
2026-03-06
Trigram Embedding & Consolidation: P-program for the orthographic bijection (7pp). Consolidation v2: GCD decomposition (Bayes from Counting), ring tower (KN-quotient v2), experimental findings (English context results). OBSERVE LPP RNG coupling, log-stochastic quantization, L∞ vs L1 gap, two-level trace, CRT word extension (11pp). Suffix collision analysis: depth 1 = 12.6%, depth 5 needed for >90%.
2026-03-03
The Lexicon Embedding: synthesis paper. Agent model, working memory, orthographic bijection, two-level trace, MDL lexicon, factor tower, quotient ring algebra, P-programming, formal tokenization contrast. 100-word results (−0.552 bpc). Experimental plan for full lexicon. 1 paper (14pp).
2026-02-24
The Path to the Lexicon + English Context Neuron: bag-of-letters embeddings, first change of base. Eight experiments (Exp 2, 3/6, 4, 5.2). Context neurons help when sharper than baseline (−0.12 bpc at unigram level). LSA primitives (⊕, ⊖) implemented; surgery fails at integer precision. 4 papers.
2026-02-22
Working Memory & Memory Traces: agent model (O=pred+input), shift chain, ES clearing, DYNAMICS. The Memory Trace paper: LATD (Look At The Data), boolean sentences, clearing as write trigger, replay/Tick-Tock, embeddings as change of base. Unigram memory trace (256 bytes, Tenv=30, LSI gap = generalization cost). 8 trace viewers, learning rate viewer. 3 papers, 8 viewers.
2026-02-20
UMR Mathematical Specification (7pp): maps every CMP formula to code (f0, softmax, ω0, LPPs as shorthand for P, threshold creation, support gap, envelope, SN format). Sharpest-LPP Scoring: support gap (s1−s2) selects most confident LPP. Threshold-4 creation (part of ω) filters sparse noise. Frozen rerun order 4 = 4.407 bpc at 64K (−0.41 vs order 2). Fixed softmax bug (s=0 → 20=1, not ε). Three scoring modes, genesis viewer, SN export. 2 papers, 1 viewer, 5 SN models.
2026-02-18
Context events & surprise: generalized oversupport, ring pattern (91% from-the-left), three-frequency model (+0.184 bpc). Marginal Dominance Theorem: pure UM bigram at ~5.3 bpc (lower-order always wins under max-min). Generic UM runner (#umr_core). UM Viewer (SN-loading, 3D rings, JS forward pass). UM Connectome (pure UM 3D viewer). 8 papers, 3 viewers, 9 experiments.