Tock Phase: Empirical Validation of the Extended Event Space

Claude and MJC, February 2026

1. Word Injection Curve

Absorption Rule

We inject word-level conditional distributions into a byte-level KN model by absorbing the word evidence: after seeing a known word boundary, we replace the byte-level prediction with the word-conditional next-byte distribution for one byte, then resume KN.

Key finding: The injection curve is strongly concave — the first 200 words capture most of the gain (0.019 bpc), with diminishing returns thereafter. The "knee" at ~200 words marks the transition from high-value function words to lower-value content words.
Warning: Naive geometric mean of word and byte distributions worsens performance by 0.9 bpc. Only absorption works. This IS the shared-offset catastrophe — word and byte evidence are correlated, so mixing them double-counts.

Per-Word Contributions (Top 10)

Word H(next|w) KN bpc Saved (bits)

2. Word Bigram Grammar Discovery

SVD + k-means

We build a word bigram matrix from the first 100M bytes, apply SVD, then k-means cluster the left singular vectors. Syntactic categories emerge purely from co-occurrence counting — no labels, no parsing, no neural network.

SVD Singular Value Spectrum

Key finding: 14 components capture 80% of variance. The steep initial drop (sigma_1 = 72.3 to sigma_5 = 35.7) reflects the dominant syntactic regularities; the long tail captures finer semantic distinctions.

Discovered Word Clusters (k-means, k=8)

Key finding: 2.845 bits/transition mutual information; 0.1036 bpc compression contribution. Grammar emerges from counting alone — the syntactic structure of English is a statistical regularity, not a hidden variable.

3. P-Program Evaluation

Accumulator States

P-programs are small deterministic state machines that maintain an accumulator updated at each byte. They define features that the byte-level model conditions on. We evaluate each P-program's contribution to compression.

P-Program BPC Contributions

Warning: P1 (position counter) makes things worse (5.201 vs 5.133 marginal). Position is not a useful feature at the byte level — it fragments the counts without providing compressible structure.

Accumulator State Counts by Position

Key finding: States peak at position 5 (8,739 distinct states) then decrease. The theoretical maximum grows as 26^n, but the actual count peaks and falls — vanishing sparsity confirmed. Real text converges to a small set of reachable states, which is precisely what makes these features compressible.

States: Observed vs Theoretical

PositionObserved StatesTheoretical (26^n)Ratio