← Back to Hutter
Archive 2026-02-15
The extended event space: injecting lexical structure into H. Formalizing the correct tock for lexicon injection.
Key insight: The the_inject experiments went wrong by mixing an external predictor with the RNN instead of extending the event space E. KN dominated because the lexical trie was compensating for the RNN’s weakness (6.43 bpc on new data), not adding genuine structure. The correct tock adds lexical events to H—position, accumulator, bag-of-letters, word identity—as first-class events participating in UM patterns.
The hourglass: Bytes → words → bytes (Prop 16 of tock protocol). Extended: I′ = bytes × position × accumulator. H′ = hidden × bag-of-letters × graded word support. O′ = future bytes at multiple offsets. Fully-connected within bounded word context subsumes both RNN-like and transformer-like patterns.
Papers
- extended-es.pdf — The Extended Event Space: Injecting Lexical Structure into H. Formalizes I′ (byte + position + accumulator), H′ (hidden + bag-of-letters + graded support), O′ (symmetric future bytes). Explains why mixing ≠ extending. The carrier signal IS the position event made explicit. RNN proved rotational embeddings; now cut to the capability. Connection to word embeddings via σ(t) vector. Research agenda: neutral injection → intra-lexical patterns → iterate vocabulary → scale toward Hutter Prize. (9 pages)
(source)
- es-commentary.pdf — The Model Within the Model: Commentary on the Extended Event Space. Self-similar architecture: model within a model at every scale. Inner/outer I/H/O distinction. Ev bijection at both I and O sides. Why letter events are not redundant (typos, strawberry problem). Tokenization as attractive nuisance. U × U perspective split. Includes MJC source commentary as Appendix A.
(source)
- nested-model.pdf — The Nested Model: Self-Similar Architecture in the Extended Event Space. Formalizes Hext = I′ × Hinner × O′. Inner model is a complete UM (Theorem 2). Iterated nesting tower with telescoping factorization. Ev bijection formalized at both I-side (recognition) and O-side (prediction). Natural transformation guarantees consistency between levels. Letter events proved necessary (spelling variants, strawberry impossibility, information-theoretic optimality). Word-start carrier via SN programming. Log-stochastic and witness implementation forms. (7 pages)
(source)
- tokenization-loss.pdf — Tokenization as Information Loss. Tokenization as surjective coarsening in the lattice of event spaces. Three information losses proved: spelling variants, position, letter accumulation (Theorem 2). Strawberry impossibility under tokenization (Theorem 3). Rate-distortion gap formal bound. BPE on enwik8: ≥0.05 bpc loss, worst case ≥0.12 bpc. Information loss bound: hb ≤ hτ/L̄ + H(bytes|tokens)/N (Theorem 8). Extended ES achieves all tokenization benefits without information loss. (6 pages)
(source)
- pattern-space.pdf — Patterns in the Extended Event Space: Independence, Correlation, and the New Synapses. Formalizes the atomic pattern P = E² × T (source, target, weight). New events without new patterns are inert (Prop 1). Four pattern families: recognition (I → H′), prediction (H′ → O), language-model (H′ → H′), structural (I′ → I′). Interior events are deterministic functions of bytes ⇒ naïve Bayesian combination fails (shared-offset catastrophe generalized). Three correct combination rules: absorption, residual, hierarchical. Hourglass bottleneck enforces independence. Three independence classes. Six testable predictions including injection curve knee at ~200 words and independence barrier at ~1 bpc. (10 pages)
(source)
- p-programs.pdf — P-Programs: Explicit Pattern Programs for the Extended Event Space. Makes the extended ES operational via P-programs (pattern compositions in SN syntax). Four explicit programs: position counter (deterministic reset at word boundaries), letter accumulator (latch + reset, tiny memory whose states biject onto memory traces), bag-of-letters recognition (conjunction of accumulator events, circuit depth 3), graded word support (first learned patterns via ω0). P-programs are inner UM instances (self-similarity at program level). Abductive learning: surprise detection → typo hypothesis → commit T=255 → record for ω-processing. Safe-combination conjecture: every unsafe evidence pair has a P-program refinement guided by E → N → Q. Toward a grammar of English: word bigrams as first grammar fragment, syntactic categories from factorization tower. (10 pages)
(source)
Navigation
← Previous: 20260214
Lexemes as binary event spaces. Neutral “the” factorization. 31 injection experiments.
Next: 20260216 →
Tock phase empirical validation. Word injection curve, bigram grammar, P-program evaluation.