2026-03-13 Archive Guide

Dates in this guide follow Pacific (San Francisco) time as requested. These summaries map each paper back to the CMP/MCP story: the Universal Model, SN format, and event-space algebra.

Multi-Retro: Counting the SN twice

In CMP note, every SN file encodes ES definitions, LPP entries, and the count table. Multi-retro simply replaying retroactive passes multiplies the counts stored in each LPP without touching the structure: think of it as multiplying the factorial term in the max-min update while keeping the UM's ES layout fixed. The routine adds 5–10ms/byte with diminishing returns, and it never stops the frozen-scoring gap because the UM still lacks the long-tail contexts.

UM/SN view: Each retro pass rewrites the same SN file (same ES, same neurons) but replays the dataset to add more counts to the count table that the reader snack sees.
Connection to MCP: The MCP lexeme of retroactive counting is the same as the tick-tock replay: tick learns structure & counts, tock freezes, and repeat the retro (tick spin) to get calibration without changing the event spaces.

Scoring Correction: Fixing the protocol

The core CMP reminder is that the universal model is defined with the SN forward pass operating on the entire dataset. Frozen scoring is valid only for models that never reset between rounds, while online scoring is the training-time protocol. This correction side-steps claims by acknowledging MCP’s insistence on consistent benchmarking: compare UM frozen to KN frozen, or UM online to KN online, but never cross-protocol.

UM/SN focus: The SN file stores the counts that create the frozen distribution. Comparing against KN online is like comparing a replayed SN to an online KN built with fewer counts.
MCP lesson: The combination problem (rigid vs tropical) hinges on consistent definitions: unit tests fail if you compare apples to oranges, just like the scoring correction fails gracefully.

Coverage Bottleneck: Cascading event spaces

The MCP organization of event spaces surfaces here. Each UM neuron lives inside an ES (e.g., bigram_prev) and authoritative creation requires a parent to exist and fire. At 100K this cascade saturates up to trigrams, but orders 5–6 happen so late that only 10% and 4.6% of KN contexts ever unlock. CMP identifies these ESs as the structured coordinates of the UM, so the coverage gap is a geometry problem: there simply aren’t enough neurons activated to cover the dense KN hash table.

KN contexts are 5-grams stored in a hash table (feature dimension is the context string). In the UM’s SN format the same information splits across ESs (children extend parents), and each ES entry must fire several times before hitting the serializer threshold.
Compressing the explanation: the coverage numbers are just the projection of the KN hash table onto the UM’s factorized ES lattice; the lattice is sparse above order 4 because tapping deeper ESs requires chaining through parents that are themselves rare.

Threshold Scaling: When low thresholds hurt

CMP teaches us to see the UM as algebra over events and their frequencies. Dropping the threshold to τ=1 floods the SN count table with 5/6-grams that appear once, so the discount steps in the KN chain steal mass and leave the model under-confident. In SN terms, the ratio problem surfaces because the numerator (context counts) stays near zero while the denominator (total mass) grows. MCP’s emphasis on clean abstraction says: new ES entries must earn their precision, and τ=1 fails that exam at scale.

Every τ=1 neuron is created at first sight; the LPP adds an entry with weight ~1. But the discount D assumes more mass, so probability is leaked to the backoff, bloating the gap to KN.
Compression connection: tau 1 is noise in the quotient (q=|inputs|/|outputs|). Sparse contexts blur the quotient tower, so the combo of MCP’s tropical competition and CMP’s SN structure both warn against over-creating neurons.

Scaling Limits: Four negative results, one synthesis

This paper is an MCP-style synthesis: it refuses to treat the UM as an isolated optimizer and insists on the CMP demand that every new ES be grounded in the same event-count algebra that the SN already encodes. The four “negative” findings—coverage bottleneck, τ=1 scaling degradation, multi-retro diminishing returns, and frozen-vs-online protocol mismatch—combine to show that the UM’s locked-in ES lattice cannot match the density of KN-6. The critical datapoint: KN-6 hits 1.925 bpc on 100M bytes and projects 1.74 bpc at 1B, leaving a 0.46 bpc gap to the Hutter Prize baseline even when extrapolating mode-2 compression.

CMP view: The SN file already enumerates every ES and pattern. Scaling limits appear because the lattice stops growing new children fast enough, so the counts that would fill a KN hash entry stay zero no matter how many replays you run.
MCP lesson: The UM’s tropical competition sits on top of a quotient-projection of context strings. When that quotient is sparse (10% coverage at order 6), replays only thicken the counts inside existing ESs, not the geometry of the lattice, and the gap to KN persists.
Critical data reference: 100M KN-6 = 1.925 bpc (0.46 bpc above the Hutter Prize target). Retro scores from 100K and 1M confirm the same divergence pattern, so scaling limits are structural, not a matter of tuning.

Word-KN6 Scaling: Word events bridge the gap

MCP chemistry says “start with the smallest ES’es you can explain.” Word events live over lexical quotients, so the word mixture supplement factors in the new event space without rewriting the UM: it just adds conjunctions that surrender to the underlying word structure. Word-KN6 buys back 0.111 bpc at 100K and 0.060 bpc at 1M by explaining the same contexts that KN-6 already saw, especially at the start of words where compression gains concentrate.

UM/SN interpretation: The word mixture adds LPPs whose support spans orthographic word events and the letters that follow, reusing existing SN event definitions and simply populating additional pattern entries.
Compression data: +0.111 bpc gain at 100K, +0.060 at 1M, measured as the word mixture’s frozen score against KN-6. The improvements are concentrated on word-start positions, where the quotient of word length to letters is most stable.
Next step: Expand the MCP lexicon work to tie these word events back to the lexicon embedding paper, so we can generalize the mixture to new languages without touching the UM’s tick-tock replay.