← Back to Hutter

Archive 2026-02-16

Tock phase empirical validation, Hutter Prize alignment, match model combination, sparse context experiments, embedding conjecture, arithmetic coding series (v1–v4), KN-quotient series (v1–v2: ring structure), and GCD empirical validation. Word injection, bigram grammar, P-program evaluation, SN model series (M0–M5), 19 Hutter Prize scoring experiments (8 negative results), GCD distribution experiments, sparse contexts, word models, indirect bigrams, and multi-model combination. UM Runner (umr): KN-6 as UM P-program, exact match validated.

Sparse contexts + match + KN-6 combination: Non-contiguous byte patterns ({1,2,4}, {1,2,4,8}, etc.) in a separate 16M-entry HT add +0.089 bpc over KN-6 alone. Combined with extended match: 1.588 bpc = 189.3 MB (1.79× record). Total +0.094 bpc over KN-6 (11.2 MB saved). Sparse saturates at 30M bytes but remains effective.

Match model combination (9 strategies): Same base models produce −1.1 to +0.33 bpc depending on strategy. Extended match (4–64 bytes) + logistic mixer = +0.048 bpc at full enwik9. Len-64: 97.6% accuracy. Five negative results: momentum SGD, multi-model softmax, KN-8/small HT, 256M HT (OOM), recency model.

Word injection via absorption: +0.038 bpc (2000 words). Naïve geometric mean destroys performance (−0.9 bpc) — the shared-offset catastrophe.

Grammar from counting: SVD of the word bigram PPMI matrix discovers syntactic categories. 14 components capture 80% of variance. K-means (k=12) produces determiners, prepositions, verbs, modals, pronouns, nouns. Word bigrams provide 2.845 bits/transition MI = 0.10 bpc compression. No parsing, no annotation.

P-programs work: Trie/accumulator (P2) dominates mid-word prediction (+1.27 bpc over marginal). Accumulator states peak at position 5 (8,739 states vs theoretical 12M), confirming vanishing sparsity.

Ring structure validated + separation result: KN count table GCD: g=1 for 98.3% of contexts. Optimal D=0.85 at 100M, D=0.87 at 1B (shifts with HT saturation). Per-row GCD discount negative: D=g(c) is +0.138 bpc worse (mean g=4585 when g>1). The algebra is exact; the optimal operation is fractional. Exact AC with KN-6 via GMP: zero decode errors at 1024 bytes, 40,949-bit integers.

Hutter Prize position: Online KN-6 on enwik9: 1.682 bpc = 200.5 MB. KN-6 + sparse + match: 1.588 bpc = 189.3 MB (1.79× record). 11.2 MB saved over KN-6 alone. fx2-cmix record: 110.8 MB (0.70 bpc gap remaining).

Papers

match-model.pdf — Match Models, Sparse Contexts, and the Combination Problem: From −1.1 to +0.09 bpc on enwik9 (19 Experiments, 8 Negative Results). Comprehensive scoring study. KN-6 + sparse + ext match: 1.588 bpc = 189.3 MB (1.79× record). Sparse contexts (+0.089 bpc), match (+0.048), combined (+0.094 bpc, 11.2 MB saved). Eight negative results: indirect bigrams, word models, recency, KN-8, shared HT, momentum SGD, multi-model softmax, larger sparse HT. Key finding: models succeed only when complementary to KN-6. (7 pages) (source)
bias-conjecture.pdf — The Bias Conjecture: Support Asymmetry in Binary Event Spaces. Mathematical treatment of the doubled-E support asymmetry. Sign-flip isomorphism theorem (exact, 0.000 bpc delta). Original sat-rnn has t(h+)/t(h-)=0.832 (more negative!); sign-canonical form: 2.21. Binary ES entropy tax: 41 bits/position. Anti-correlated pairs (r=−0.98) are ES-mate candidates; pairing captures +8 bits. Conjecture confirmed in refined form. (9 pages) (source)
tock-empirical.pdf — Tock Phase Empirical Results: Word Injection, Bigram Grammar, and P-Program Evaluation. Three experiments validating extended ES predictions. Injection curve confirms concave shape with +0.038 bpc at 2000 words via absorption. Word bigrams reveal syntactic categories (14 SVD components, 12 k-means clusters). P-program trie dominates mid-word. Accumulator states biject onto memory traces (8,398 vs 11.9M at pos 4). Naïve combination catastrophe demonstrated. (6 pages) (source)
sn-models.pdf — SN Model Series and Hutter Prize Alignment. Six models M0–M5 (unigram to KN-6+words). Online KN-6 = 1.683 bpc = 210.4 MB on enwik9 vs fx2-cmix record 110.8 MB (1.9× gap). Gap: 0.08 + 0.10 + 0.62 bpc. SN format examples. Description length trade-off. Roadmap. (4 pages) (source)
kn-quotient.pdf — Kneser–Ney as Quotient: Pulling $n$-gram Smoothing into the Universal Model. (v1.) KN discount as GCD-based common evidence removal. Continuation count as column-side type quotient. Interpolation recursion as hierarchical combination over order-projection tower. Unified pattern table $P$ dissolves the combination problem. Discount–GCD gap identified as central open question. Research agenda: per-row GCD discount, log-stochastic forward pass, word event extension. (9 pages) (source)
kn-quotient-v2.pdf — Kneser–Ney on the Integers: The Ring Structure. (v2.) Restates KN on the integers using the um-arithmetic-v4 framework. Context $c \in \mathbb{Z}/256^{k-1}\mathbb{Z}$, order projection = $c \bmod 256^{k-2}$ (literal mod). Discount = subtraction in $\mathbb{Z}$ (approximates division by GCD). Continuation count = fiber support size under output projection. Interpolation = weighted tower of ring surjections. Discount–GCD gap resolved: subtraction vs division, small because $g=1$ almost everywhere. Combination problem dissolves via CRT: choose $|W|$ coprime to 256. Full history = 256-adic integer $\in \mathbb{Z}_{256}$. (9 pages) (source)
embedding-conjecture.pdf — The Embedding Conjecture: Word Representations from Byte-Level Universal Models. Word embeddings arise from byte-level UMs via three mechanisms: causal capture (compress spelling), forward inference (predict from word identity), backward surprise response (reverse compression when surprised). Words as binary ES in 2^ℓ. Spelling as arithmetic-coded residual. Evidence from sat-RNN factor map (word_len reset, 5.31 norm at boundaries). Five testable predictions including the “adding `the'” break-even test. (6 pages) (source)
um-arithmetic.pdf — Arithmetic Coding via Universal Models: Any UM is a Compressor. (v1, for MJC approval.) Formalizes the UM→compressor construction via arithmetic coding. Luck interpretation (quantile = Q = λ). ES preservation in the code. Practical considerations: 500-line decompressor, online vs two-pass, comparison to gzip/bzip2/cmix. v16 = 189 MB score via AC. Gap decomposition: higher-order, bit-level, more models, better mixing. (5 pages) (source)
um-arithmetic-v2.pdf — Arithmetic Coding, Quotients, and Prime Powers: The $E \to \mathbb{N} \to Q$ Bridge. (v2.) Connects AC to the quotient $Q = e/|E|$ and prime power encoding. The quotient IS the arithmetic coding interval position. Angle view: $\theta = 2\pi Q$ on the unit circle. Prime factorization = event readout (fundamental theorem of arithmetic). Number theory of compression: GCD detects shared state, Möbius inversion corrects pattern overlap, Euler’s totient gives sparsity. Factored phases, phase walk, product vs sum encoding. Bridge theorem unifying all five views. (8 pages) (source)
um-arithmetic-v3.pdf — The Sequence as an Integer: Direct Construction from $256^t$ to Fourier on Words. (v3.) Extends $E$ through time: enwik9 = $\sum b_t \cdot 256^{t-1}$, a 2.4-billion-digit integer. $Q = e/256^n$ IS the file as a base-256 fraction. AC precision window = KN context window (dual sliding views into $Q$). $\lambda$-quotient: $\lambda=1$ (bytes), $\lambda=8$ (bits), $\lambda=1/\bar\ell$ (words). Fixed word clock = rhythm in speech. Fourier of per-position code lengths: DC = total score, word-frequency peak = word-level compression, harmonics = phrase structure. Quotient hierarchy: bits/bytes/words/phrases are different bases for reading the same $Q$. (9 pages) (source)
um-arithmetic-v4.pdf — Integer Factorization of Events: Every Integer Is an Event, Every Quotient Is Division. (v4.) The algebra is literal, not analogical. With exact AC (GMP): events ARE integers, quotient spaces ARE $\mathbb{Z}/d\mathbb{Z}$, projection IS $e \bmod d$, remainder IS discarded information. Divisor lattice of $|E|$ = lattice of all quotient event spaces. Independence = coprimality. CRT = event space factorization. Decoding = integer factoring. Equivalence classes carry ring structure. Dirichlet characters = Fourier basis on the unit group. Compression difficulty = factoring difficulty. (9 pages) (source)
integer-events-empirical.pdf — Integer Events Made Concrete: Exact Arithmetic Coding and the Divisor Lattice. GMP implementation of um-arithmetic-v4. 1024-byte segment = 2,466-digit integer. Quotient projections extract bytes. CRT verified. Exact AC: zero errors at 1024 bytes, <0.2 bit overhead (8,450-bit integers). Per-row GCD discount negative result: +0.138 bpc. Separation: algebra is exact, statistics require fractional prior. (4 pages) (source)
kn-gcd-empirical.pdf — The GCD Is Almost Always One: Empirical Validation of the KN Ring Structure. Tests the kn-quotient-v2 prediction: $g(c)=1$ for 98.3% of contexts (4,395/4,473 sampled rows, orders 2–6). Discount sweep: optimal $D^*=0.85$ (1.9537 bpc), $D=0.9$ is +0.003 bpc, $D=1.0$ is +0.058 bpc. Discount–GCD gap decomposed: fractional vs integer (0.058 bpc, dominant), global vs per-row (<0.003), subtraction vs division (<0.002). Per-row GCD discount is a negative result: $D=g(c)$ loses +0.138 bpc (mean $g=4585$ when $g>1$). The ring structure is algebraically correct but the optimal operation is fractional. (5 pages) (source)

Experiments

bias_conjecture.c — Bias conjecture exp 1: pre-activation stats, support ratio, redistribution sweep, timing signal.
bias_conjecture2.c — Bias conjecture exp 2: sign-flip (buggy), 4-way ES, support budget, ES-mate complementarity.
bias_conjecture3.c — Bias conjecture exp 3: corrected sign-flip isomorphism, entropy analysis, pairing.
injection_curve.c — First attempt: geometric mean combination. Shows naïve mixing worsens by 0.9 bpc.
injection_curve2.c — Corrected: absorption rule. +0.038 bpc at 2000 words.
word_bigrams.c — Word bigram SVD analysis. 500 words, 100M bytes, k-means clustering.
p_program_eval.c — P-program evaluation. Position, accumulator, BoL, graded support.
sn_models.c — SN model series M0–M5. Builds, evaluates, exports SN files. Hutter Prize alignment.
hutter_score.c — Online KN-6 baseline on enwik9. 1.683 bpc = 210.4 MB.
hutter_score2.c — KN-8 with per-order discounts. Negative: HT saturation makes it worse than KN-6.
hutter_score3.c — KN-6 + match, geometric mean. Catastrophic: 3.06 bpc (+1.1 worse).
hutter_score4.c — KN-6 + match, oracle absorption. 1.625 bpc (+0.33), but invalid for real compression.
hutter_score5.c — KN-6 + match, full distribution combinations. Too slow (infeasible).
hutter_score6.c — KN-6 + match, valid combinations. Three strategies tested. Adaptive linear: +0.036 bpc at 100M, +0.028 bpc at 1B.
hutter_score7.c — KN-6 + match, logistic mixer (INVALID). Uses P(actual) as feature. 1.628 bpc at 100M.
hutter_score8.c — KN-6 + match, valid logistic mixer. Per-bucket weights with EMA features. +0.042 bpc at 100M, +0.035 bpc at 1B = 196.3 MB (best valid).
hutter_score9.c — KN-6 + match, momentum logistic mixer. Negative: momentum harmful (+0.011 bpc only).
hutter_score10.c — KN-6 + match + o1 + word cache, multi-model softmax. Negative: unstable weights (+0.013 bpc only).
hutter_score11.c — KN-6 + match, 256M HT (3 GB). Negative: OOM killed on full enwik9 (needs ~4.5 GB).
hutter_score12.c — KN-6 + extended match (4–64 bytes), chain-8, logistic mixer. +0.051 bpc at 100M, +0.048 bpc at 1B = 194.7 MB (1.84×). len-64: 97.6% accuracy.
hutter_score13.c — KN-6 + ext match + recency model, 3-way mixer. Negative: recency adds +0.002 bpc at 100M, dilutes match weight.
hutter_score14.c — KN-8 dual HT (128M primary + 32M secondary). Negative: secondary HT 100% saturated at 100M, KN-8 worse than KN-6.
hutter_score15.c — KN-6 + sparse contexts (shared HT). Shows +0.154 bpc at 100M but degrades KN-6 via HT contention (99.8% full).
hutter_score16.c — KN-6 + sparse (separate 16M HT) + ext match. Sparse adds +0.089 bpc at 1B. Combined: 1.588 bpc = 189.3 MB (1.79×). Best result.
hutter_score17.c — KN-6 + sparse (32M HT) + ext match. Marginal: +0.003 bpc over v16 at 100M.
hutter_score18.c — KN-6 + sparse + match + indirect bigrams (d=16,32,64,128,256) + class bigrams. Negative: indirect=4.83 bpc, class=4.35 bpc, combined −0.004 bpc (slight harm). Dominated by KN-6.
hutter_score19.c — KN-6 + sparse + match + word-level bigrams (32M HT). Negative: word model=2.80 bpc, adds only +0.0005 bpc. HT 87% full at 16M. KN-6+sparse already captures word structure.
kn_gcd.c — GCD distribution experiment 1: KN-6 count table GCD sampling (every 1M), D=0.9 vs D=1.0. Result: g=1 in 97.8%, D=1.0 is +0.055 bpc worse.
kn_gcd2.c — GCD distribution experiment 2: discount sweep (D=0.50–1.00, 11 values), GCD sampling every 100K, type count distribution. Result: g=1 in 98.3%, optimal D=0.85 (1.9537 bpc).
kn_gcd3.c — Per-row GCD discount experiment: 5 strategies (D=0.85, D=0.90, D=1.0, D=g(c), hybrid). Negative: D=g(c) is +0.138 bpc worse (mean g=4585 when g>1). Singleton tracker for fast-path optimization.
integer_events.c — GMP demonstration of um-arithmetic-v4: sequence as integer, quotient projections, coprime quotients, CRT reconstruction, AC position Q, byte extraction via division. 32–1024 bytes.
exact_ac.c — Exact arithmetic coding with GMP rationals. Unigram model, encode+decode, zero errors at 1024 bytes. 8450-bit integers, <0.2 bit delta from theoretical. The AC state IS exact integer arithmetic.
exact_ac_kn.c — UM runner: exact AC with KN-6 model. Online KN-6 predictions drive GMP exact arithmetic coding. 1024 bytes: 2.78 bpc, zero decode errors, 0.01 bit delta, 40,949-bit integers. The AC state IS the integer quotient.
p_program_kn.c — P-program features through KN-6. Negative: conditioning on word_position (+0.145 bpc), in_word (+0.117), word_len (+0.145), position_bucket (+0.184). Features fragment data without adding independent info. Confirms CRT: features must be coprime to byte context.
kn_dsweep.c — Fine-grained discount sweep (D=0.70–0.95, step 0.01) on full enwik9. D* tracks HT saturation: 0.85→0.83→0.82→0.87 as HT fills 8%→37%→92%→100%. At 1B: D*=0.87 (1.6817 bpc), only 0.0003 bpc better than D=0.90.
umr_built.c — UM Runner (umr): the single executable for all experiments. KN-6 + sparse + match as UM P-programs: 8 event spaces, 6 LPPs, max-min forward pass = KN interpolation chain, omega_0 = online counting. Exact match on ALL THREE scores at 1M: KN-6=2.3976, +sparse=2.1853, +match=2.1430 bpc (identical to hutter_score16). 645 lines, 6 blocks (Hutter:UMR. marker). All future experiments through this runner.

SN Exports

model0_unigram.sn — Unigram model (256 events, 6.6 KB)
model1_bigram.sn — Bigram model (512 events, 17,781 patterns, 729 KB)

Interactive

combination-explainer.html — Combination strategies explainer: why mixing matters more than model quality. Interactive demos of geometric vs linear failure, match accuracy chart, adaptive weight dynamics, Hutter Prize scoreboard.
tock-explainer.html — Tock explainer with SVG charts: injection curve, singular value spectrum, cluster visualization, accumulator state analysis.
sn-explainer.html — SN model explainer with interactive charts: model hierarchy, Hutter Prize dashboard, gap decomposition, description length trade-off slider.
lpp-viz.html — LPP 3D visualization: two-ring (purple E_a / blue E_b) torus with sparse connections showing byte bigram joint events from first 1024 bytes of enwik9. Drag to rotate, scroll to zoom, hover for detail.
kn-explainer.html — KN-quotient concrete explainer: interactive bigram table, row decomposition (discount as GCD approximation), column types (continuation counts), interpolation tower, sliding window demo. Real data from enwik9.

Navigation

← Previous: 20260215

Extended event space formalism. Pattern space P = E² × T. P-Programs. 6 papers.

Next: 20260218 →

Context events and P-programming methodology. Missing prior = missing context event.