← Back to Hutter

Archive 2026-02-12

Carrier signals, factor permutations, CMP review, Bayes from counting, epistemology of zero, logic from counting, the tock step, Wittgenstein's Tractatus, and the mathematical foundations: counting monad, renormalization, expressiveness, compression-prediction duality, category of event spaces, fixed-point theorem, information geometry, algebraic semantics

Papers

carrier-signal.pdf — The Carrier Signal Problem: Why Product Patterns Need Orthogonal Offsets. PI-SGD gap has two sources: hash collisions (90/256 lost) and shared-offset non-independence (Bayesian 8 pairs = 27 bpc). SVD events reach 1.64 bpc; byte KN-5 reaches 2.29 bpc at 10M. Path to <2 bpc via orthogonal offsets + event factoring. (4 pages) (source)
factor-permutation-commentary.pdf — The Factor-Permutation Map: Commentary on the February 11 Archive. Traces the equal-dimension factor permutation (S_k on Z₀^k) through all 20 papers of the 20260211 archive. Two key ramifications: (1) most problems have low-dimensional inner product structure (d≈20 « k=128), collapsing S₁₂₈ to a tractable subproblem; (2) most numbers are not prime—macrostate integers are highly composite, and their unique factorization IS the interpretation. (10 pages) (source)
cmp-review.pdf — A Mathematical Review of CMP. Detailed review of u = (e, t, p, f, ω). Assesses framework coherence against the February 11 empirical program. Three extensions beyond the original paper: the equal-dimension factor permutation, the thermodynamic identification (Shannon = Boltzmann), and bidirectional construction (data → weights without training). (8 pages) (source)
bayes-from-counting.pdf — Bayes from Counting: Partial Quotients, GCD, and the Symmetric Learning Function on E = I × O. E = I × O factorization with partial quotients (divide by atomic event, multiply by equivalence class). GCD decomposition separates common from differential evidence; Bayes falls out as consistency between I-side and O-side partial quotients. The E → N → Q chain handles both Bayesian updating and luck, justifying CMP’s “correlation captures causation.” (9 pages) (source)
conjunction-invariant.pdf — The Conjunction Invariant: Why R²≈0.83 Is a Fixed Point of the 128-Hidden Tanh RNN. Proves the 2-offset conjunction R² is determined by the ratio 2√(log H)/√H of order statistics to chi-norm. Scale-invariant (cancels σ_w). Xavier gives R²=0.89 (i.i.d. baseline); training reduces to 0.83 via correlations. Predicts R² declines for H>256. (6 pages) (source)
conditional-independence.pdf — Conditional Independence on the Offset Graph. Formalizes the shared-offset catastrophe. Defines the offset graph G=(V,E); proves vertex-disjointness (matching) is necessary and sufficient for valid Bayesian combination. Gives the absorption rule and inclusion-exclusion correction for overlapping offsets. Optimal structure: perfect matching on 2K vertices. (6 pages) (source)
tropical-gcd.pdf — The Tropical–Integer GCD Bridge: When the Universal Model Computes Exact Bayes. The UM's min operation (tropical GCD) gives exact conditionals—the gap vs integer GCD affects only the prior decomposition. Log-stochastic counting produces near-divisibility-regular tables. The UM is a MAP predictor; replacing max with logsumexp gives full Bayes. (6 pages) (source)
no-support.pdf — No Support Is Not Disbelief: The Epistemology of Zero in the Universal Model. SN strength 0 means “no support” (ignorance), not “certainly false.” Certainty of falsehood requires positive support for the ES-mate. Three epistemic states: ignorance, belief, conflict. Three sources of support: evidence (observation + inference), belief (explicit choice/axioms), abduction (short-circuiting induction). The min operation propagates ignorance correctly. KN smoothing as controlled closed-world commitment (δ≈0.75). (7 pages) (source)
logic-from-counting.pdf — Logic from Counting: Existential Quantification, Probabilistic Syllogisms, and the Derivation of Formal Inference from the Universal Model. The forward pass is an existentially quantified probabilistic syllogism—searching the pattern space for a middle term. Derives conjunction (min), disjunction (max), implication (Gödel residual), negation (ES-mates), quantifiers (min/max over domains), modus ponens/tollens, hypothetical syllogism, and all valid Aristotelian figures. Classical propositional logic is the binary, closed-world limit. The UM’s native logic is Gödel fuzzy logic: graded, open-world, cautious. (11 pages) (source)
tock-step.pdf — The Tock Step: Domain-Native Architecture from Evidence. The tock step discovers the next event space E_k+1 and the maps connecting it to existing ESes. Learning the architecture = next E_i + maps. Formalizes the factorization tower E₀ → E₁ → … → E_n and backtracking through it. The “strawberry theorem”: character-level tasks are unsolvable at token-level event spaces. Architecture from evidence (MI-driven) vs. architecture from belief (hyperparameter search). Connects backward trie, skip-k-grams, factor map, and weight construction as instances of the tock step. (10 pages) (source)
tractatus-logico-universalis.pdf — Tractatus Logico-Universalis. Formal commentary on the Tractatus Logico-Philosophicus alongside the UM. Corrected ontology: the world is X (reality), not D (data stream); E is the map; 1.1 is an epistemic commitment (the projection X→E); propositions are statements about reality, not just predictions. Proposition 7: kann (cannot), not darf (may not). Scientific revolutions as factor maps. Gödel’s treadmill. German and English throughout. Three versions preserved (v1 parallel drafts, v2, current). (9 pages) (source, v2)
tractatus-um-v1.pdf — Tractatus Logico-Universalis v1. Earlier draft equating world=D. Preserved as historical record. (11 pages) [Superseded] (source)
wittgenstein-v1.pdf — The Tractatus in UM Terms v1. Earlier systematic translation equating world=E. Preserved as historical record. (12 pages) [Superseded] (source)
counting-monad.pdf — The Counting Monad: Categorical Structure of the Universal Model. The counting function ω₀ is the counit of an adjunction; the resulting monad captures the full learning pipeline. Unit = single observation. Multiplication = evidence accumulation. Forward pass = Kleisli morphism (tropical variant). The E → N → Q chain factors through free commutative monoid, Grothendieck group, field of fractions. The five-tuple IS the monad. (8 pages) (source)
renormalization.pdf — Renormalization and the Factorization Tower: Coarse-Graining Event Spaces by Mutual Information. The factorization tower is an RG trajectory. Fixed points are natural event spaces. Discrete β-function measures MI loss per coarsening. Universality: different models converge to same fixed points (properties of data, not model). The strawberry theorem as RG irreversibility. Kadanoff block-spin analogy. (7 pages) (source)
expressiveness.pdf — Expressiveness and Limits of the Tropical Forward Pass. Binary limit: exactly monotone DNF. With ES-mates: all Boolean functions. Graded: tropical polynomials (universal approximation). Multi-layer (pattern chains): depth hierarchy from monotone circuit complexity. Parity requires 2^n-1 patterns. Forward pass is 1-Lipschitz (tropical MAP). (5 pages) (source)
compression-prediction.pdf — The Compression–Prediction Duality in Universal Model Terms. Count table = lossy compressor + predictor. Compression rate = cross-entropy. Rate–distortion for event space coarsening. Factorization tower = hierarchy of rate-distortion points. MDL: optimal ES balances model complexity vs prediction accuracy. Byte KN scaling from 4.51 bpc (1K) to 2.00 bpc (100M). Hutter Prize = best predictor. (6 pages) (source)
category-es.pdf — The Category of Event Spaces: Morphisms, Products, and Adjunctions. Category ES with ES-mate-preserving morphisms. Products = joint ESes. Coproducts = disjoint union. Lattice of event spaces ordered by refinement. Galois connection between refinement and coarsening. Tock functor (lax). Natural transformations between UMs = consistent predictions. Free UM adjunction. (5 pages) (source)
fixed-point.pdf — The Fixed-Point Theorem for the Universal Model. The tick-tock cycle (count then optimize ES) converges to a self-consistent fixed point. Existence by finiteness + MI monotonicity. Uniqueness for ergodic sources. Convergence in S(|Σ|,K) steps (2–5 empirically). Connection to EM algorithm. Fixed point = MI maximum = optimal bpc. Meta-fixed-points for model calibration. (5 pages) (source)
information-geometry.pdf — Information Geometry of the Count Table. Count table defines a point on the statistical manifold with Fisher metric. Pattern matrix = natural parameters (log-odds). KL divergence = cross-entropy loss. Pythagorean theorem for smoothing. Geodesics = EM path. Tick = motion ON manifold; tock = change OF manifold. KN smoothing = geometric centering. R²≈0.83 as geometric invariant. Forward pass is 1-Lipschitz in tropical metric. (5 pages) (source)
algebraic-semantics.pdf — Algebraic Semantics of the Universal Model: Lattices, Residuation, and the Interpretation Theorem. Support lattice L={0,...,255} is a bounded distributive lattice. Gödel residual provides implication. Forward pass = unique join-preserving lattice homomorphism. Interpretation Theorem: every prediction has unique algebraic normal form. Gödel fuzzy logic = equational theory of L. Three epistemic completions (evidence, belief, abduction). Uniqueness theorem: forward pass algebraically forced by four axioms. (5 pages) (source)
kn-scaling.pdf — Scaling Byte-Level Kneser–Ney to 1.78 bpc on enwik9: Zero Structure, Just Counting. Full scaling curve from 10M to 1B: KN-6i with D=0.9 gives 1.784 bpc on held-out test data. Diminishing returns (~0.07/doubling at small scale, ~0.03 at large). Skip offsets add zero value. Class-based output fails. HT saturation is the bottleneck. Everything below 1.784 requires event spaces. (3 pages) (source)
forcing-pumping.pdf — Baby Steps: Forcing, Pumping, and Diagonalization from the E→N→Q Chain. Five foundational results derived from counting: pumping lemmas (finite E + infinite N = forced cycles), Cohen forcing (generic events with zero support), diagonalization (E cannot contain its own factorization map = Gödel’s treadmill), compactness (finite support of count tables), fixed-point lemma (Knaster–Tarski on the counting monad). All are properties of the same functor. (5 pages) (source)
wants.pdf — Wants: Goal Structure in the Universal Model. A want is a desired event with high target strength—an output on inputs. Energy of a want = Shannon surprise (−log P). Luck of a want = how much evidence helps. Factored luck ⇒ subgoals: if λ = ∏λ_i, each factor is an independently satisfiable sub-want. Decision states (TRACKED→CHECKED→ASSISTED→OWNED) map to incremental application of the UM five-tuple. Greedy satisfaction of factored wants within factor e of optimal. Transparent wants = aligned systems. (6 pages) (source)

Key finding: All top-8 offset pairs share d=1, making naïve Bayesian combination catastrophic (27 bpc with 8 pairs, worse than uniform 8 bpc). The fix: disjoint offset pairs so each contributes independent information.

Key Results

Factor map pairs match data MI
(1,7) MI=4.48 bits at 1024B, (1,8) at 4.46 — exactly matching the RNN factor map. The RNN discovers the same structure that exists in the data.

Bayesian combination is catastrophic
2 pairs: 8.17 bpc. 4 pairs: 14.3. 8 pairs: 27.2 bpc. Shared d=1 is raised to the Kth power.

Byte KN-5 at 10M: 2.29 bpc
Already better than the sat-rnn (2.81 bpc at 110M). No structure, just exact counting with KN smoothing.

SVD-16 events at 1024B: 1.64 bpc
SVD of skip-bigram matrix reveals natural event spaces. 16 events per offset with KN order-12.

K=4 events capture 90%+ of neuron MI
Doubled-E (K=2, sign) wastes half the information (41–52%). K=4 is the sweet spot for ES absorption.

Bayes from GCD consistency
Row GCD g_I and column GCD g_O decompose the same joint counts. Equating gives r_I/r_O = g_O/g_I, which forces P(o|i) = P(i|o)·P(o)/P(i). No prior needed—just counting.

Luck = marginal × conditional
Q(i,o) decomposes as Q_marginal × Q_conditional. Conditional luck depends only on reduced counts (GCD removed). Correlation in counting IS causation in CMP.

R²≈0.83 is a fixed point from order statistics
The conjunction R² = erf²(2√(log H)/√H). At H=128: 0.83. Scale-invariant: σ_w cancels. Xavier = 0.89 (i.i.d. baseline), training correlations reduce to 0.83. Predicts H=256 → R²≈0.78.

Offset graph: matching = independence
Two conditionals are independent given output iff their offset edges are vertex-disjoint. Star graph (all share d=1) → catastrophic. Corrected formula subtracts (deg-1) copies of shared offsets.

Tropical GCD = exact conditionals
The UM's min (tropical GCD) and the integer GCD give the SAME conditionals. The gap affects only prior decomposition. The UM is a MAP predictor; softmax replaces max for full Bayes.

No support ≠ certainly false
SN strength 0 = ignorance (open-world), not disbelief. Certainty of falsehood requires positive support for the ES-mate. The min operation propagates ignorance correctly: no support in → no support out.

Formal logic derived from counting
The forward pass (max_i min(t_i, p_ij)) is an existentially quantified probabilistic syllogism. All classical inference rules (modus ponens, tollens, hypothetical & disjunctive syllogism) and Aristotle's valid figures follow. Classical logic = binary, closed-world limit. The UM's native logic is Gödel fuzzy logic.

Tock = architecture from evidence
Learning the architecture = discovering next E_i + maps via MI. The factorization tower supports backtracking (fine ← coarse). The “strawberry theorem”: character-level tasks are provably unsolvable at token-level event spaces. NAS is belief; tock is evidence.

Tractatus: X → E → T (not D = world)
Wittgenstein’s “world” is X (reality), not D (data stream). E is the map. “Die Welt ist die Gesamtheit der Tatsachen, nicht der Dinge” = the projection X→E. Proposition 7: the left of E is unavailable to language. Scientific shifts = factor maps.

Counting is a monad
ω₀ is the free commutative monoid construction. Forward pass = Kleisli morphism in the tropical variant. The five-tuple u = (e,t,p,f,ω) IS the monad unpacked. E→N→Q = monad unit → Grothendieck group → field of fractions.

Factorization tower = RG flow
Natural event spaces are fixed points of the discrete renormalization group. Universality: different models find the same ESes because they’re RG fixed points of the data. The strawberry theorem = RG irreversibility.

Forward pass = monotone DNF (binary) / tropical polynomial (graded)
ES-mates provide negation for full Boolean expressiveness. Pattern chains give depth hierarchy (circuit complexity). Parity requires 2^n-1 patterns. The UM’s expressiveness matches the data’s complexity.

Tick-tock converges to unique fixed point
The count-then-optimize-ES iteration converges by MI monotonicity on a finite lattice. Unique for ergodic sources. Connection to EM algorithm. 2–5 iterations empirically.

Forward pass algebraically forced
max_i min(t_i, p_ij) is the UNIQUE operation that is join-preserving, residuated, cautious, and faithful. Gödel fuzzy logic = equational theory of the support lattice. The UM’s logic is not a design choice—it’s forced by algebra.

Byte KN scaling: 1.784 bpc at full 1B enwik9 (order 6)
Full scaling curve: 100M=2.001, 200M=1.927, 400M=1.889, 800M=1.859, 1B=1.784. Zero structure, just byte counting + KN smoothing. HT at 99.9% saturation—bigger table would improve further. Improvement rate: ~0.07/doubling at small scale, slowing to ~0.03 at large.

Skip offsets add zero value at byte level
Skip-offset KN (d=7..12) gives 4.6 bpc alone. Product-of-experts combination with byte KN: best α=1.0 (byte KN alone wins). Sequential context dominates at byte level.

Factored luck ⇒ subgoals
A want is an output event. Energy = −log P (Shannon surprise). When luck factors (λ = ∏λ_i), each factor is a subgoal: satisfy independently. Decision states (TRACKED→OWNED) = incremental UM five-tuple. Greedy satisfaction within factor e of optimal.

Interactive Experiment Viewers

The Carrier Signal Problem — Shared-offset catastrophe, Bayesian combination failure, MI retention by K events, KN scaling, SVD events, offset pair analysis. Data from es_discovery.c and baseline_kn.c.
Bayes from Counting — Interactive count table with live GCD decomposition, partial quotients, Bayes verification, luck decomposition, PMI chart. Modify counts to see all quantities update.
A Mathematical Review of CMP — Interactive five-tuple diagram, E→N→Q chain, prime-power encoding, three extensions scorecard, CMP claims vs empirical evidence (4 confirmed, 3 extended, 5 open).
The Factor-Permutation Map — Interactive permutation action on coordinates, 128-bit binary state visualization, prime factorization explorer, alignment collapse chart, all 20 Feb-11 papers traced.
The Tock Step — Interactive factorization tower, strawberry theorem demo (type a word to see token vs character resolution), domain-native tock sequence with BPC chart, NAS vs tock comparison radar, tick-tock cycle visualization.
No Support Is Not Disbelief — Interactive epistemic state explorer (ignorance/belief/conflict), min vs product propagation demo, [0,255] vs [-128,127] comparison, KN smoothing slider, three sources of support, probability regimes.
Tractatus Logico-Universalis v1 Viewer — Proposition navigator with German/English/UM side-by-side, concept map, showing vs saying demo, three zones of reality, Proposition 7 as theorem. [Out of date — based on v1 draft; needs combining with below into single viewer for v3]
The Tractatus in UM Terms v1 Viewer — Full proposition browser with expand/collapse, interactive summary table, correction tracker, concept glossary. [Out of date — based on v1 draft; combine with above]

Experimental Tools

write_weights26.c — Full-bandwidth shift-register (3 W_y methods)
write_weights27.c — Calibrated version with proper scaling
write_weights28.c — SVD event carrier with KN n-gram
write_weights29.c — Calibrated additive with MI-ranked offsets
write_weights30.c — 2-offset product patterns (factor map experiment)
es_discovery.c — Event space analysis of trained RNN neurons
baseline_kn.c — Byte KN n-gram baselines at multiple scales
write_weights31.c — Orthogonal offset pairs (disjoint + skip-KN)
write_weights32.c — Large-table skip-KN
write_weights33.c — Class-based KN
scale_kn.c — KN scaling across data sizes
scale_kn2.c — KN scaling to 200M (KN-6i=1.927)
scale_kn3.c — KN scaling to full 1B enwik9
scale_kn4.c — Order-6 at 1B with 256M HT
carrier_kn.c — Skip-byte KN combination
skip_byte_kn.c — Skip+byte KN product of experts
write_weights34.c — MI-based input classes
write_weights35.c — Balanced class-based KN (negative result)

Navigation

← Previous: 20260211_2

Scaling to full enwik9. R²=0.83 invariant. Checkpoint trajectory. 4 papers.

Next: 20260213 →

Tock protocol: systematic lexicon injection into the isomorphic UM.