Dates in this guide follow Pacific (San Francisco) time as requested. These summaries map each paper back to the CMP/MCP story: the Universal Model, SN format, and event-space algebra.
In CMP note, every SN file encodes ES definitions, LPP entries, and the count table. Multi-retro simply replaying retroactive passes multiplies the counts stored in each LPP without touching the structure: think of it as multiplying the factorial term in the max-min update while keeping the UM's ES layout fixed. The routine adds 5–10ms/byte with diminishing returns, and it never stops the frozen-scoring gap because the UM still lacks the long-tail contexts.
The core CMP reminder is that the universal model is defined with the SN forward pass operating on the entire dataset. Frozen scoring is valid only for models that never reset between rounds, while online scoring is the training-time protocol. This correction side-steps claims by acknowledging MCP’s insistence on consistent benchmarking: compare UM frozen to KN frozen, or UM online to KN online, but never cross-protocol.
The MCP organization of event spaces surfaces here. Each UM neuron lives inside an ES (e.g., bigram_prev) and authoritative creation requires a parent to exist and fire. At 100K this cascade saturates up to trigrams, but orders 5–6 happen so late that only 10% and 4.6% of KN contexts ever unlock. CMP identifies these ESs as the structured coordinates of the UM, so the coverage gap is a geometry problem: there simply aren’t enough neurons activated to cover the dense KN hash table.
CMP teaches us to see the UM as algebra over events and their frequencies. Dropping the threshold to τ=1 floods the SN count table with 5/6-grams that appear once, so the discount steps in the KN chain steal mass and leave the model under-confident. In SN terms, the ratio problem surfaces because the numerator (context counts) stays near zero while the denominator (total mass) grows. MCP’s emphasis on clean abstraction says: new ES entries must earn their precision, and τ=1 fails that exam at scale.
This paper is an MCP-style synthesis: it refuses to treat the UM as an isolated optimizer and insists on the CMP demand that every new ES be grounded in the same event-count algebra that the SN already encodes. The four “negative” findings—coverage bottleneck, τ=1 scaling degradation, multi-retro diminishing returns, and frozen-vs-online protocol mismatch—combine to show that the UM’s locked-in ES lattice cannot match the density of KN-6. The critical datapoint: KN-6 hits 1.925 bpc on 100M bytes and projects 1.74 bpc at 1B, leaving a 0.46 bpc gap to the Hutter Prize baseline even when extrapolating mode-2 compression.
MCP chemistry says “start with the smallest ES’es you can explain.” Word events live over lexical quotients, so the word mixture supplement factors in the new event space without rewriting the UM: it just adds conjunctions that surrender to the underlying word structure. Word-KN6 buys back 0.111 bpc at 100K and 0.060 bpc at 1M by explaining the same contexts that KN-6 already saw, especially at the start of words where compression gains concentrate.
This guide is a companion to the archive index. Each section is a bridge from the dense math of the papers to the CMP/MCP worldview: events, counts, quotients, and tick-tock replay. Let me know if you’d like a short textbook-style chapter next.