← Back to Hutter

Archive 2026-02-12

Carrier signals, factor permutations, CMP review, Bayes from counting, epistemology of zero, logic from counting, the tock step, Wittgenstein's Tractatus, and the mathematical foundations: counting monad, renormalization, expressiveness, compression-prediction duality, category of event spaces, fixed-point theorem, information geometry, algebraic semantics

Papers

Key finding: All top-8 offset pairs share d=1, making naïve Bayesian combination catastrophic (27 bpc with 8 pairs, worse than uniform 8 bpc). The fix: disjoint offset pairs so each contributes independent information.

Key Results

Factor map pairs match data MI
(1,7) MI=4.48 bits at 1024B, (1,8) at 4.46 — exactly matching the RNN factor map. The RNN discovers the same structure that exists in the data.
Bayesian combination is catastrophic
2 pairs: 8.17 bpc. 4 pairs: 14.3. 8 pairs: 27.2 bpc. Shared d=1 is raised to the Kth power.
Byte KN-5 at 10M: 2.29 bpc
Already better than the sat-rnn (2.81 bpc at 110M). No structure, just exact counting with KN smoothing.
SVD-16 events at 1024B: 1.64 bpc
SVD of skip-bigram matrix reveals natural event spaces. 16 events per offset with KN order-12.
K=4 events capture 90%+ of neuron MI
Doubled-E (K=2, sign) wastes half the information (41–52%). K=4 is the sweet spot for ES absorption.
Bayes from GCD consistency
Row GCD gI and column GCD gO decompose the same joint counts. Equating gives rI/rO = gO/gI, which forces P(o|i) = P(i|o)·P(o)/P(i). No prior needed—just counting.
Luck = marginal × conditional
Q(i,o) decomposes as Qmarginal × Qconditional. Conditional luck depends only on reduced counts (GCD removed). Correlation in counting IS causation in CMP.
R²≈0.83 is a fixed point from order statistics
The conjunction R² = erf²(2√(log H)/√H). At H=128: 0.83. Scale-invariant: σw cancels. Xavier = 0.89 (i.i.d. baseline), training correlations reduce to 0.83. Predicts H=256 → R²≈0.78.
Offset graph: matching = independence
Two conditionals are independent given output iff their offset edges are vertex-disjoint. Star graph (all share d=1) → catastrophic. Corrected formula subtracts (deg-1) copies of shared offsets.
Tropical GCD = exact conditionals
The UM's min (tropical GCD) and the integer GCD give the SAME conditionals. The gap affects only prior decomposition. The UM is a MAP predictor; softmax replaces max for full Bayes.
No support ≠ certainly false
SN strength 0 = ignorance (open-world), not disbelief. Certainty of falsehood requires positive support for the ES-mate. The min operation propagates ignorance correctly: no support in → no support out.
Formal logic derived from counting
The forward pass (maxi min(ti, pij)) is an existentially quantified probabilistic syllogism. All classical inference rules (modus ponens, tollens, hypothetical & disjunctive syllogism) and Aristotle's valid figures follow. Classical logic = binary, closed-world limit. The UM's native logic is Gödel fuzzy logic.
Tock = architecture from evidence
Learning the architecture = discovering next Ei + maps via MI. The factorization tower supports backtracking (fine ← coarse). The “strawberry theorem”: character-level tasks are provably unsolvable at token-level event spaces. NAS is belief; tock is evidence.
Tractatus: X → E → T (not D = world)
Wittgenstein’s “world” is X (reality), not D (data stream). E is the map. “Die Welt ist die Gesamtheit der Tatsachen, nicht der Dinge” = the projection X→E. Proposition 7: the left of E is unavailable to language. Scientific shifts = factor maps.
Counting is a monad
ω0 is the free commutative monoid construction. Forward pass = Kleisli morphism in the tropical variant. The five-tuple u = (e,t,p,f,ω) IS the monad unpacked. E→N→Q = monad unit → Grothendieck group → field of fractions.
Factorization tower = RG flow
Natural event spaces are fixed points of the discrete renormalization group. Universality: different models find the same ESes because they’re RG fixed points of the data. The strawberry theorem = RG irreversibility.
Forward pass = monotone DNF (binary) / tropical polynomial (graded)
ES-mates provide negation for full Boolean expressiveness. Pattern chains give depth hierarchy (circuit complexity). Parity requires 2n-1 patterns. The UM’s expressiveness matches the data’s complexity.
Tick-tock converges to unique fixed point
The count-then-optimize-ES iteration converges by MI monotonicity on a finite lattice. Unique for ergodic sources. Connection to EM algorithm. 2–5 iterations empirically.
Forward pass algebraically forced
maxi min(ti, pij) is the UNIQUE operation that is join-preserving, residuated, cautious, and faithful. Gödel fuzzy logic = equational theory of the support lattice. The UM’s logic is not a design choice—it’s forced by algebra.
Byte KN scaling: 1.784 bpc at full 1B enwik9 (order 6)
Full scaling curve: 100M=2.001, 200M=1.927, 400M=1.889, 800M=1.859, 1B=1.784. Zero structure, just byte counting + KN smoothing. HT at 99.9% saturation—bigger table would improve further. Improvement rate: ~0.07/doubling at small scale, slowing to ~0.03 at large.
Skip offsets add zero value at byte level
Skip-offset KN (d=7..12) gives 4.6 bpc alone. Product-of-experts combination with byte KN: best α=1.0 (byte KN alone wins). Sequential context dominates at byte level.
Factored luck ⇒ subgoals
A want is an output event. Energy = −log P (Shannon surprise). When luck factors (λ = ∏λi), each factor is a subgoal: satisfy independently. Decision states (TRACKED→OWNED) = incremental UM five-tuple. Greedy satisfaction within factor e of optimal.

Interactive Experiment Viewers

Experimental Tools

Navigation

← Previous: 20260211_2
Scaling to full enwik9. R²=0.83 invariant. Checkpoint trajectory. 4 papers.
Next: 20260213 →
Tock protocol: systematic lexicon injection into the isomorphic UM.