← Back to Hutter
Archive 2026-02-16
Tock phase empirical validation, Hutter Prize alignment, match model combination, sparse context experiments, embedding conjecture, arithmetic coding series (v1–v4), KN-quotient series (v1–v2: ring structure), and GCD empirical validation. Word injection, bigram grammar, P-program evaluation, SN model series (M0–M5), 19 Hutter Prize scoring experiments (8 negative results), GCD distribution experiments, sparse contexts, word models, indirect bigrams, and multi-model combination. UM Runner (umr): KN-6 as UM P-program, exact match validated.
Sparse contexts + match + KN-6 combination: Non-contiguous byte patterns ({1,2,4}, {1,2,4,8}, etc.) in a separate 16M-entry HT add +0.089 bpc over KN-6 alone. Combined with extended match: 1.588 bpc = 189.3 MB (1.79× record). Total +0.094 bpc over KN-6 (11.2 MB saved). Sparse saturates at 30M bytes but remains effective.
Match model combination (9 strategies): Same base models produce −1.1 to +0.33 bpc depending on strategy. Extended match (4–64 bytes) + logistic mixer = +0.048 bpc at full enwik9. Len-64: 97.6% accuracy. Five negative results: momentum SGD, multi-model softmax, KN-8/small HT, 256M HT (OOM), recency model.
Word injection via absorption: +0.038 bpc (2000 words). Naïve geometric mean destroys performance (−0.9 bpc) — the shared-offset catastrophe.
Grammar from counting: SVD of the word bigram PPMI matrix discovers syntactic categories. 14 components capture 80% of variance. K-means (k=12) produces determiners, prepositions, verbs, modals, pronouns, nouns. Word bigrams provide 2.845 bits/transition MI = 0.10 bpc compression. No parsing, no annotation.
P-programs work: Trie/accumulator (P2) dominates mid-word prediction (+1.27 bpc over marginal). Accumulator states peak at position 5 (8,739 states vs theoretical 12M), confirming vanishing sparsity.
Ring structure validated + separation result: KN count table GCD: g=1 for 98.3% of contexts. Optimal D=0.85 at 100M, D=0.87 at 1B (shifts with HT saturation). Per-row GCD discount negative: D=g(c) is +0.138 bpc worse (mean g=4585 when g>1). The algebra is exact; the optimal operation is fractional. Exact AC with KN-6 via GMP: zero decode errors at 1024 bytes, 40,949-bit integers.
Hutter Prize position: Online KN-6 on enwik9: 1.682 bpc = 200.5 MB. KN-6 + sparse + match: 1.588 bpc = 189.3 MB (1.79× record). 11.2 MB saved over KN-6 alone. fx2-cmix record: 110.8 MB (0.70 bpc gap remaining).
Papers
- match-model.pdf — Match Models, Sparse Contexts, and the Combination Problem: From −1.1 to +0.09 bpc on enwik9 (19 Experiments, 8 Negative Results). Comprehensive scoring study. KN-6 + sparse + ext match: 1.588 bpc = 189.3 MB (1.79× record). Sparse contexts (+0.089 bpc), match (+0.048), combined (+0.094 bpc, 11.2 MB saved). Eight negative results: indirect bigrams, word models, recency, KN-8, shared HT, momentum SGD, multi-model softmax, larger sparse HT. Key finding: models succeed only when complementary to KN-6. (7 pages)
(source)
- bias-conjecture.pdf — The Bias Conjecture: Support Asymmetry in Binary Event Spaces. Mathematical treatment of the doubled-E support asymmetry. Sign-flip isomorphism theorem (exact, 0.000 bpc delta). Original sat-rnn has t(h+)/t(h-)=0.832 (more negative!); sign-canonical form: 2.21. Binary ES entropy tax: 41 bits/position. Anti-correlated pairs (r=−0.98) are ES-mate candidates; pairing captures +8 bits. Conjecture confirmed in refined form. (9 pages)
(source)
- tock-empirical.pdf — Tock Phase Empirical Results: Word Injection, Bigram Grammar, and P-Program Evaluation. Three experiments validating extended ES predictions. Injection curve confirms concave shape with +0.038 bpc at 2000 words via absorption. Word bigrams reveal syntactic categories (14 SVD components, 12 k-means clusters). P-program trie dominates mid-word. Accumulator states biject onto memory traces (8,398 vs 11.9M at pos 4). Naïve combination catastrophe demonstrated. (6 pages)
(source)
- sn-models.pdf — SN Model Series and Hutter Prize Alignment. Six models M0–M5 (unigram to KN-6+words). Online KN-6 = 1.683 bpc = 210.4 MB on enwik9 vs fx2-cmix record 110.8 MB (1.9× gap). Gap: 0.08 + 0.10 + 0.62 bpc. SN format examples. Description length trade-off. Roadmap. (4 pages)
(source)
- kn-quotient.pdf — Kneser–Ney as Quotient: Pulling $n$-gram Smoothing into the Universal Model. (v1.) KN discount as GCD-based common evidence removal. Continuation count as column-side type quotient. Interpolation recursion as hierarchical combination over order-projection tower. Unified pattern table $P$ dissolves the combination problem. Discount–GCD gap identified as central open question. Research agenda: per-row GCD discount, log-stochastic forward pass, word event extension. (9 pages)
(source)
- kn-quotient-v2.pdf — Kneser–Ney on the Integers: The Ring Structure. (v2.) Restates KN on the integers using the um-arithmetic-v4 framework. Context $c \in \mathbb{Z}/256^{k-1}\mathbb{Z}$, order projection = $c \bmod 256^{k-2}$ (literal mod). Discount = subtraction in $\mathbb{Z}$ (approximates division by GCD). Continuation count = fiber support size under output projection. Interpolation = weighted tower of ring surjections. Discount–GCD gap resolved: subtraction vs division, small because $g=1$ almost everywhere. Combination problem dissolves via CRT: choose $|W|$ coprime to 256. Full history = 256-adic integer $\in \mathbb{Z}_{256}$. (9 pages)
(source)
- embedding-conjecture.pdf — The Embedding Conjecture: Word Representations from Byte-Level Universal Models. Word embeddings arise from byte-level UMs via three mechanisms: causal capture (compress spelling), forward inference (predict from word identity), backward surprise response (reverse compression when surprised). Words as binary ES in 2ℓ. Spelling as arithmetic-coded residual. Evidence from sat-RNN factor map (word_len reset, 5.31 norm at boundaries). Five testable predictions including the “adding `the'” break-even test. (6 pages)
(source)
- um-arithmetic.pdf — Arithmetic Coding via Universal Models: Any UM is a Compressor. (v1, for MJC approval.) Formalizes the UM→compressor construction via arithmetic coding. Luck interpretation (quantile = Q = λ). ES preservation in the code. Practical considerations: 500-line decompressor, online vs two-pass, comparison to gzip/bzip2/cmix. v16 = 189 MB score via AC. Gap decomposition: higher-order, bit-level, more models, better mixing. (5 pages)
(source)
- um-arithmetic-v2.pdf — Arithmetic Coding, Quotients, and Prime Powers: The $E \to \mathbb{N} \to Q$ Bridge. (v2.) Connects AC to the quotient $Q = e/|E|$ and prime power encoding. The quotient IS the arithmetic coding interval position. Angle view: $\theta = 2\pi Q$ on the unit circle. Prime factorization = event readout (fundamental theorem of arithmetic). Number theory of compression: GCD detects shared state, Möbius inversion corrects pattern overlap, Euler’s totient gives sparsity. Factored phases, phase walk, product vs sum encoding. Bridge theorem unifying all five views. (8 pages)
(source)
- um-arithmetic-v3.pdf — The Sequence as an Integer: Direct Construction from $256^t$ to Fourier on Words. (v3.) Extends $E$ through time: enwik9 = $\sum b_t \cdot 256^{t-1}$, a 2.4-billion-digit integer. $Q = e/256^n$ IS the file as a base-256 fraction. AC precision window = KN context window (dual sliding views into $Q$). $\lambda$-quotient: $\lambda=1$ (bytes), $\lambda=8$ (bits), $\lambda=1/\bar\ell$ (words). Fixed word clock = rhythm in speech. Fourier of per-position code lengths: DC = total score, word-frequency peak = word-level compression, harmonics = phrase structure. Quotient hierarchy: bits/bytes/words/phrases are different bases for reading the same $Q$. (9 pages)
(source)
- um-arithmetic-v4.pdf — Integer Factorization of Events: Every Integer Is an Event, Every Quotient Is Division. (v4.) The algebra is literal, not analogical. With exact AC (GMP): events ARE integers, quotient spaces ARE $\mathbb{Z}/d\mathbb{Z}$, projection IS $e \bmod d$, remainder IS discarded information. Divisor lattice of $|E|$ = lattice of all quotient event spaces. Independence = coprimality. CRT = event space factorization. Decoding = integer factoring. Equivalence classes carry ring structure. Dirichlet characters = Fourier basis on the unit group. Compression difficulty = factoring difficulty. (9 pages)
(source)
- integer-events-empirical.pdf — Integer Events Made Concrete: Exact Arithmetic Coding and the Divisor Lattice. GMP implementation of um-arithmetic-v4. 1024-byte segment = 2,466-digit integer. Quotient projections extract bytes. CRT verified. Exact AC: zero errors at 1024 bytes, <0.2 bit overhead (8,450-bit integers). Per-row GCD discount negative result: +0.138 bpc. Separation: algebra is exact, statistics require fractional prior. (4 pages)
(source)
- kn-gcd-empirical.pdf — The GCD Is Almost Always One: Empirical Validation of the KN Ring Structure. Tests the kn-quotient-v2 prediction: $g(c)=1$ for 98.3% of contexts (4,395/4,473 sampled rows, orders 2–6). Discount sweep: optimal $D^*=0.85$ (1.9537 bpc), $D=0.9$ is +0.003 bpc, $D=1.0$ is +0.058 bpc. Discount–GCD gap decomposed: fractional vs integer (0.058 bpc, dominant), global vs per-row (<0.003), subtraction vs division (<0.002). Per-row GCD discount is a negative result: $D=g(c)$ loses +0.138 bpc (mean $g=4585$ when $g>1$). The ring structure is algebraically correct but the optimal operation is fractional. (5 pages)
(source)
Experiments
- bias_conjecture.c — Bias conjecture exp 1: pre-activation stats, support ratio, redistribution sweep, timing signal.
- bias_conjecture2.c — Bias conjecture exp 2: sign-flip (buggy), 4-way ES, support budget, ES-mate complementarity.
- bias_conjecture3.c — Bias conjecture exp 3: corrected sign-flip isomorphism, entropy analysis, pairing.
- injection_curve.c — First attempt: geometric mean combination. Shows naïve mixing worsens by 0.9 bpc.
- injection_curve2.c — Corrected: absorption rule. +0.038 bpc at 2000 words.
- word_bigrams.c — Word bigram SVD analysis. 500 words, 100M bytes, k-means clustering.
- p_program_eval.c — P-program evaluation. Position, accumulator, BoL, graded support.
- sn_models.c — SN model series M0–M5. Builds, evaluates, exports SN files. Hutter Prize alignment.
- hutter_score.c — Online KN-6 baseline on enwik9. 1.683 bpc = 210.4 MB.
- hutter_score2.c — KN-8 with per-order discounts. Negative: HT saturation makes it worse than KN-6.
- hutter_score3.c — KN-6 + match, geometric mean. Catastrophic: 3.06 bpc (+1.1 worse).
- hutter_score4.c — KN-6 + match, oracle absorption. 1.625 bpc (+0.33), but invalid for real compression.
- hutter_score5.c — KN-6 + match, full distribution combinations. Too slow (infeasible).
- hutter_score6.c — KN-6 + match, valid combinations. Three strategies tested. Adaptive linear: +0.036 bpc at 100M, +0.028 bpc at 1B.
- hutter_score7.c — KN-6 + match, logistic mixer (INVALID). Uses P(actual) as feature. 1.628 bpc at 100M.
- hutter_score8.c — KN-6 + match, valid logistic mixer. Per-bucket weights with EMA features. +0.042 bpc at 100M, +0.035 bpc at 1B = 196.3 MB (best valid).
- hutter_score9.c — KN-6 + match, momentum logistic mixer. Negative: momentum harmful (+0.011 bpc only).
- hutter_score10.c — KN-6 + match + o1 + word cache, multi-model softmax. Negative: unstable weights (+0.013 bpc only).
- hutter_score11.c — KN-6 + match, 256M HT (3 GB). Negative: OOM killed on full enwik9 (needs ~4.5 GB).
- hutter_score12.c — KN-6 + extended match (4–64 bytes), chain-8, logistic mixer. +0.051 bpc at 100M, +0.048 bpc at 1B = 194.7 MB (1.84×). len-64: 97.6% accuracy.
- hutter_score13.c — KN-6 + ext match + recency model, 3-way mixer. Negative: recency adds +0.002 bpc at 100M, dilutes match weight.
- hutter_score14.c — KN-8 dual HT (128M primary + 32M secondary). Negative: secondary HT 100% saturated at 100M, KN-8 worse than KN-6.
- hutter_score15.c — KN-6 + sparse contexts (shared HT). Shows +0.154 bpc at 100M but degrades KN-6 via HT contention (99.8% full).
- hutter_score16.c — KN-6 + sparse (separate 16M HT) + ext match. Sparse adds +0.089 bpc at 1B. Combined: 1.588 bpc = 189.3 MB (1.79×). Best result.
- hutter_score17.c — KN-6 + sparse (32M HT) + ext match. Marginal: +0.003 bpc over v16 at 100M.
- hutter_score18.c — KN-6 + sparse + match + indirect bigrams (d=16,32,64,128,256) + class bigrams. Negative: indirect=4.83 bpc, class=4.35 bpc, combined −0.004 bpc (slight harm). Dominated by KN-6.
- hutter_score19.c — KN-6 + sparse + match + word-level bigrams (32M HT). Negative: word model=2.80 bpc, adds only +0.0005 bpc. HT 87% full at 16M. KN-6+sparse already captures word structure.
- kn_gcd.c — GCD distribution experiment 1: KN-6 count table GCD sampling (every 1M), D=0.9 vs D=1.0. Result: g=1 in 97.8%, D=1.0 is +0.055 bpc worse.
- kn_gcd2.c — GCD distribution experiment 2: discount sweep (D=0.50–1.00, 11 values), GCD sampling every 100K, type count distribution. Result: g=1 in 98.3%, optimal D=0.85 (1.9537 bpc).
- kn_gcd3.c — Per-row GCD discount experiment: 5 strategies (D=0.85, D=0.90, D=1.0, D=g(c), hybrid). Negative: D=g(c) is +0.138 bpc worse (mean g=4585 when g>1). Singleton tracker for fast-path optimization.
- integer_events.c — GMP demonstration of um-arithmetic-v4: sequence as integer, quotient projections, coprime quotients, CRT reconstruction, AC position Q, byte extraction via division. 32–1024 bytes.
- exact_ac.c — Exact arithmetic coding with GMP rationals. Unigram model, encode+decode, zero errors at 1024 bytes. 8450-bit integers, <0.2 bit delta from theoretical. The AC state IS exact integer arithmetic.
- exact_ac_kn.c — UM runner: exact AC with KN-6 model. Online KN-6 predictions drive GMP exact arithmetic coding. 1024 bytes: 2.78 bpc, zero decode errors, 0.01 bit delta, 40,949-bit integers. The AC state IS the integer quotient.
- p_program_kn.c — P-program features through KN-6. Negative: conditioning on word_position (+0.145 bpc), in_word (+0.117), word_len (+0.145), position_bucket (+0.184). Features fragment data without adding independent info. Confirms CRT: features must be coprime to byte context.
- kn_dsweep.c — Fine-grained discount sweep (D=0.70–0.95, step 0.01) on full enwik9. D* tracks HT saturation: 0.85→0.83→0.82→0.87 as HT fills 8%→37%→92%→100%. At 1B: D*=0.87 (1.6817 bpc), only 0.0003 bpc better than D=0.90.
- umr_built.c — UM Runner (umr): the single executable for all experiments. KN-6 + sparse + match as UM P-programs: 8 event spaces, 6 LPPs, max-min forward pass = KN interpolation chain, omega_0 = online counting. Exact match on ALL THREE scores at 1M: KN-6=2.3976, +sparse=2.1853, +match=2.1430 bpc (identical to hutter_score16). 645 lines, 6 blocks (Hutter:UMR. marker). All future experiments through this runner.
SN Exports
Interactive
- combination-explainer.html — Combination strategies explainer: why mixing matters more than model quality. Interactive demos of geometric vs linear failure, match accuracy chart, adaptive weight dynamics, Hutter Prize scoreboard.
- tock-explainer.html — Tock explainer with SVG charts: injection curve, singular value spectrum, cluster visualization, accumulator state analysis.
- sn-explainer.html — SN model explainer with interactive charts: model hierarchy, Hutter Prize dashboard, gap decomposition, description length trade-off slider.
- lpp-viz.html — LPP 3D visualization: two-ring (purple Ea / blue Eb) torus with sparse connections showing byte bigram joint events from first 1024 bytes of enwik9. Drag to rotate, scroll to zoom, hover for detail.
- kn-explainer.html — KN-quotient concrete explainer: interactive bigram table, row decomposition (discount as GCD approximation), column types (continuation counts), interpolation tower, sliding window demo. Real data from enwik9.
Navigation
← Previous: 20260215
Extended event space formalism. Pattern space P = E² × T. P-Programs. 6 papers.
Next: 20260218 →
Context events and P-programming methodology. Missing prior = missing context event.