An event space E_i is a finite set of mutually exclusive events, exactly one of which is true at any given time. The total event space is a product — choosing a factorization is choosing a coordinate system for the state space.

Empirical confirmation: The sat-rnn's 128 hidden neurons map to 128 binary event spaces (|E_i| = 2). The doubled-E construction matches within 0.00% bpc difference.

Key insight: "Epistemic precommitments to divide reality into distinguishable parts." Philosophically loaded but mathematically precise — the factorization is not unique, and the choice of factorization is the central analytical decision.

t ∈ T — Thought (model's current belief)

T ≅ E — t ∈ {0,1}^|E| (discrete) or t ∈ (0,1)^|E| (continuous)

A total thought assigns a belief value to each event. In the concrete SN representation, strengths are in [0, 255] where 0 = "certainly false" and 255 = "certainly true".

SN strength = log₂ Ω where Ω = number of microstates (dataset positions) supporting the event. This is the Shannon-Boltzmann identity pinned down by the Feb 11 archive.

The isomorphism T ≅ E is key: a thought is an assignment of belief to each possible event.

Discrete is primary: Feb 11 results show sign bits carry 99.7% of compression. The mantissa is noise — the continuous form is an artifact of float representation (for tanh-saturated RNNs).

p ∈ P — Pattern (knowledge structure)

p: T → T — Atomic pattern: (e_a, e_b) with strength

A total pattern maps thoughts to thoughts. An atomic pattern (e_a, e_b) with strength s means "when e_a is observed, e_b becomes more likely."

Standard update (tropical): (f_p(t))_j = max_i min(t_i, p_ij) — conjunction as min, disjunction as max. A (max, min) tropical semiring computation.

The neural network update h' = tanh(Wh + b) is a different (non-tropical) realization. For the sat-rnn, tanh saturation makes it approximately Boolean: tanh(z) is within 10^-6 of ±1 for 98.9% of neuron-steps (mean margin = 60.5).

f ∈ F — Update Function (P × T → T)

f: P × T → T — Apply patterns to current thought

The update function takes a pattern and a thought and produces a new thought. CMP defines F as a union of binary strings up to fixed length bounds, with decoding maps into realizable functions.

Open question: The relationship between the tropical update (max-min) and the neural update (matrix multiply + nonlinearity) is established empirically for sat-rnn but not proved in general.

Finiteness convention: By encoding F as bounded binary strings, everything stays finite and computable. |U| < ∞ and all information quantities are well-defined.

ω ∈ Ω — Learning Function (T × E → P)

ω: T × E → P — Update patterns from observations

The standard learning function ω₀ maintains a log contingency table via log-stochastic counting: with probability 2^-s, set s → s+1. This keeps E[2^s] = count.

Key property: The log contingency table is a sufficient statistic for the function I → O given the data. Everything statistically learnable from observations is captured by this matrix.

Symmetry: The same matrix serves I → O (forward) and O → I (backward) by transposition. This is the bidirectionality confirmed by the temporal bi-embedding paper.

2. Information Decomposition

Since U is a product, total information is additive:

I(U) = log|E| + log|T| + log|P| + log|F| + log|Ω|

Information Content by Component (sat-rnn, 128 hidden)

128 bits

128 binary ESes

128 bits

T ≅ E (isomorphic)

49,408 params

W_h (128×128) + W_x (256×128)

tanh (fixed — ~0 free bits)

33,024 params

W_y (128×256) + bias + SGD

Approximate information content of each UM component for the sat-rnn. Total: ~82k parameters (656k bits at f32).

3. Event Spaces and Encodings

CMP defines event spaces as finite sets with product structure. Two concrete encodings map E into the natural numbers.

Coprime (CRT)

Prime-Power

Product Structure

Coprime Encoding (Chinese Remainder Theorem)

n(e₁, ..., e_k) = ∑_i e_i ∏_j<i |E_j|

When cardinalities |E_i| are pairwise coprime, the CRT guarantees unique recovery: (e₁, ..., e_k) = (n mod |E₁|, ...).

Example: E₁ = {0,1,2} (3 values), E₂ = {0,1,2,3,4} (5 values), E₃ = {0,1,2,3,4,5,6} (7 values). Total |E| = 105. State (2, 3, 4) encodes to n = 2 + 3×3 + 4×15 = 71. Recover: 71 mod 3 = 2, 71 mod 5 = 1... (mixed-radix, not direct CRT mod).

Prime-Power Encoding (Feb 10 Extension)

N(σ) = ∏_i=1^k p_i^e_i(σ)

For equal-dimension factors (e.g., all binary), assign one prime per event space. The fundamental theorem of arithmetic guarantees unique factorization.

Binary ESes: N(σ) = ∏_i∈S p_i where S = set of "on" bits. Square-free products. Every composite N has non-trivial internal structure — factorization IS interpretability.

Example: 4 Binary Event Spaces

N = 1 (all off)

Click the event spaces to toggle them on/off

Product Structure of Event Spaces

E = E₁ × E₂ × ... × E_k ⇒ I(E) = I(E₁) + I(E₂) + ... + I(E_k)

Information decomposes additively because E is a product. The quotient E/E₁ identifies events differing only in E₁, giving ∏_i≠1 E_i.

Factored vs. unfactored information: k binary event spaces give 2^k total states but only k bits of information content.

4. The E → N → Q Chain

The quotient chain traces computation through the universal model, from raw events to the luck of predictions.

E Events

→

N Numbers

→

Q Quotient

E → N: Encoding

Events embed into natural numbers via the prime-power encoding. Each factor E_i corresponds to a prime p_i, and the macrostate integer N = ∏ p_i^e_i is uniquely recoverable.

128 binary ESes → products of up to 128 distinct primes. Hamming weight w gives w prime factors = w independent pieces of interpretive content.

N → Q: Quotient

The quotient Q = λ traces the luck of events. Q = 1/p = inverse probability of the observed sequence under the model. The quotient presupposes the factored perspective.

Q = λ: quotient over dataset positions = luck of events. The quotient chain E → N → Q traces computation at every layer.

5. Multiplication: The Universal Combining Operation

CMP identifies multiplication as universal across all five components.

Factor	Multiplication	Additive Form	Example
E	Cartesian product E₁ × E₂	I(E) = I(E₁) + I(E₂)	coin × weather
T	Independent beliefs t = (t₁, t₂)	log p = log p₁ + log p₂	joint probability
P	Layer composition P₁ · P₂	matrix multiplication	multi-layer net
F	Function composition f₁ ∘ f₂	chained updates	forward pass
ω	Learning composition	chained updates	multi-epoch training

The quotient operation is multiplication's inverse: E/E₁ = ∏_i≠1 E_i.

6. Three Extensions Beyond CMP

The February 11 archive extends the theory in three directions not present in the original paper.

Factor Permutation

Thermodynamic ID

Bidirectional Construction

Extension 1: Equal-Dimension Factor Permutation

Z ≅ Z₀^k ⇒ S_k acts by permuting coordinates ⇒ φ ∈ S_k × Z₀^k → T

When all event spaces have the same dimension (as in the 128 binary ESes of the sat-rnn), the symmetric group S_k acts by permuting coordinates. CMP does not discuss this symmetry.

Consequence: The alignment problem between architecture-natural and domain-natural factorizations becomes a permutation search. The low-dimensional inner product structure (d ≈ 20 functional features + 108 gauge dimensions) makes this search tractable.

This gives CMP's abstract φ: H_arch → H_domain a precise algebraic form.

Extension 2: Thermodynamic Identification (Shannon = Boltzmann)

The Feb 11 microstate-macrostate paper identifies the binary-ES softmax with the Boltzmann distribution at β = ln 2. CMP does not make this connection.

CMP	Thermodynamics
Event space E_i	Degree of freedom
SN strength s	log₂ Ω (log microstates)
Pattern strength	Free energy difference
Update f	Partition function evaluation
Quotient Q	Luck = 1/p = inverse probability

Shannon-Boltzmann identity: H = log₂ N - ⟨S_B⟩ bridges information theory and statistical mechanics through the event formalism. Arguably the most important theoretical extension.

Extension 3: Bidirectional Construction (Data ↔ Weights)

CMP defines ω₀ (log-stochastic counting) for the unfactored case. The archive applies it to a factored system and shows the result is constructive.

Forward (data → weights): Hebbian covariance, skip-bigram log-ratios, and shift-register design give all 82k parameters from data statistics. Achieves 1.89 bpc with zero gradient descent.

Backward (weights → interpretation): Factor map, backward attribution chains, and Boolean automaton analysis recover the data's statistical structure from trained weights.

Analytic (data-derived) construction vs. SGD training. The analytic approach is 39,800x cheaper and achieves better generalization.

7. The Equivalence Thesis

Central Claim: Interpretability and efficiency in learning systems are identical problems, both resolved by recovering the correct factorization of the event space.

Efficiency ⇒ Interpretability

An efficient model (sparse P, small |E|) must factor E into small event spaces with sparse patterns. If the factorization matches domain structure, it is automatically interpretable: event spaces have domain-natural names, patterns express domain-natural relationships.

Interpretability ⇒ Efficiency

An interpretable model has named event spaces and explicit patterns. This naming IS a factorization, and an interpreted factorization is always at least as efficient as an unfactored one: I(E) = ∑ I(E_i), and patterns between small E_i are sparser.

The failure mode: Architecture-natural ≠ domain-natural. Deep learning provides a factorization (layers, neurons, attention heads) efficient for gradient-based learning but opaque because it doesn't match the domain's natural decomposition. Interpretability seeks the refactoring map φ: H_arch → H_domain.

The equivalence in practice: analytic construction is both 39,800x cheaper AND fully interpretable (every parameter has a data-statistical meaning).

8. CMP Claims vs. Empirical Evidence

Scorecard assessing CMP's claims against the February 7-11 empirical program.

What CMP Gets Right

✓

The five-tuple is exhaustive

Every component of the RNN maps cleanly to one of (e, t, p, f, ω). No sixth component is needed.

CONFIRMED

✓

Factorization is the right abstraction

The entire interpretability program reduces to finding the correct factorization of E (128 binary ESes) and P (sparse patterns between them).

CONFIRMED

✓

The standard learning function works

Log-stochastic counting is exactly what Hebbian construction computes. 50% blend improves trained model by 0.66 bpc.

CONFIRMED

✓

The equivalence thesis holds

Analytic construction: both more interpretable (every param has meaning) and more efficient (39,800x cheaper, 1.89 vs 4.97 bpc trained).

CONFIRMED

What CMP Leaves Open

How to find the factorization

CMP proves the correct factorization exists but gives no algorithm. Feb 11 provides one empirical recipe (Boolean analysis → factor map → backward attribution) but generalization unclear.

OPEN

Update function beyond standard case

Tropical (max-min) vs. neural (matrix multiply + nonlinearity) equivalence is empirical for sat-rnn (98.9% Boolean) but not proved in general.

OPEN

Continuous vs. discrete

Sign bits carry 99.7% for tanh RNNs, but transformers with softmax attention may genuinely use continuous dynamics.

OPEN

The role of depth

Dominant patterns involve deep temporal offsets (d=18-25). CMP accommodates depth via composition (P = P_L...P₁) but doesn't single it out as primary.

OPEN

Scaling beyond H=128

All results concern 128-hidden single-layer RNN on 262k bytes. CMP's claims are universal. Cost analysis (gap widens with H) is encouraging but untested.

OPEN

Extensions from Feb 11 Archive

Equal-dimension factor permutation

S_k symmetry on Z₀^k gives φ a precise algebraic form. Low-dimensional (d≈20) inner product structure makes permutation search tractable.

EXTENDED

Thermodynamic identification

Shannon = Boltzmann at β = ln 2. Every CMP quantity gets a physical meaning. H = log₂ N - ⟨S_B⟩.

EXTENDED

Bidirectional construction

Forward: all 82k params from data stats (1.89 bpc, zero SGD). Backward: factor map recovers data structure from weights. ω₀ applied to factored systems is constructive.

EXTENDED

9. Summary Assessment

Distribution of CMP claims across categories: confirmed by empirical evidence, extended by Feb 11 archive, or remaining open questions.

Conclusion: CMP's framework is mathematically precise where it needs to be (event spaces, factorization, the standard learning function) and deliberately open where it should be (the update function, the learning function beyond the standard case). The central claim — that interpretability and efficiency are the same problem, both resolved by correct factorization — is confirmed by the empirical results.

Most significant contribution: Identifying factorization as the primary analytical tool, unifying interpretability, efficiency, information theory, and statistical mechanics under a single algebraic structure. The concrete representation (E → N via prime powers) makes this unification computational, and the Feb 11 archive makes it empirical.

Navigation

Feb 12 Archive

All papers and tools from February 12, 2026

Carrier Signal Viewer

Product patterns, orthogonal offsets, KN baselines

Bayes from Counting Viewer

Bayesian inference as event counting

Claude and MJC — February 12, 2026 — A Mathematical Review of CMP