← Back to Hutter
Archive 2026-02-18
Context events, P-programming methodology, and the UM forward pass as Boolean logic. Formalizing the theoretical foundations for moving beyond KN-6 within the UM runner.
Navigation
← Previous: 20260216
Tock phase empirical validation. Match + sparse combination. 13 papers, 35 experiments. UM Runner.
Generalized oversupport. Oversupport is strong support for any two events sharing an ES—contradicting the ES epistemics (mutual exclusivity). Can come from sensory conflict OR internally “from the left” (multiple LPPs). Measured as second-highest support relative to total ES energy flux. Binary: oversupport = contradiction (including argument by contradiction from the left), undersupport = excluded middle. Three interpretations: model error, ES isn’t an ES, or genuinely rare (zeppelin). Ring pattern: P-programmish oversupport detection via e_a → e_b → e_support chains.
Papers
- context-events.pdf (v3) — Context Events and the Induction of Priors (13 sections). New in v3: Generalized oversupport—not just predicted-vs-observed, but any two events with strong support in the same ES. “From the left” = multiple LPPs causing internal contradiction. Second-highest support as measurement. Three interpretations (model error, wrong ES, genuinely rare). Binary internal oversupport = argument by contradiction. Boolean query to find exceptions: LPP over potential-context ES and oversupport event. Prediction loss as special case (output-ES sensory conflict). Ramification no longer requires external observation. (v2, v1)
(source)
- connectome-layers.pdf (v3) — Deriving Layers from the Connectome (13 sections). New in v3: UM Explorer redesigned—neuron-first view (not character-grid), per-ES LPP visualization (like 0216 LPP viewer), Boolean query via clicking events, oversupport AND undersupport both red. Sparse-matrix LPP storage as alternative to hash tables: (e_a, e_b, log-supp) triples. Multi-frequency f: FFT connection from um-arithmetic papers. sat-rnn corrected: 0.079 bpc on 1024 bytes (memorization, 128 hidden). “Connectome as theory of data” section preserved. Start simple, iterate, multiple approaches in parallel. (v2, v1)
(source)
- ring-pattern.pdf (v3) — The Ring Pattern: Measuring Oversupport in the Universal Model (9 sections). v3: First raw s(2) from hash-table counts. Per-order sparsity gradient: order-6 decisive (51% s(2)=0), order-1 always confused (95% s(2)≥64). Raw s(2) and p(2) nearly uncorrelated (r=0.047)—normalization destroys information. The right metric is s(2) at the highest active order. In-tag: raw 1660 vs 3096 (1.87×). Connects to sparse-LPP threshold (s≥4). Still: 91% from-the-left disagreement, 6.8% confidently-wrong, 14.1% confused-but-right. (v2 superseded)
- sparse-lpp.pdf (v2) — Sparse LPP Storage: Joint Events as Neurons (8 sections). v2: Hashes must go—they violate UM universality. Each context (“th”, “the”) should be an actual joint event (neuron) in the UM, connected by min-patterns to constituents. Sparse creation: joint events born only when log-support ≥4 (~16 occurrences), giving ~20K neurons vs 128M hash slots. Layered: bytes → bigrams → trigrams → output emerges from compositional requirement. P-programming challenge: conjunction via min, growing event spaces, backoff as patterns. Connects to layer-derivation paper.
- marginal-dominance.pdf — Marginal Dominance in the UM Forward Pass. Pure UM bigram experiment plateaus at ~5.3 bpc across 3 orders of magnitude in data size. The Marginal Dominance Theorem: in max-min with absolute log-counts, w0(b) ≥ w1(a,b) ≥ w2(β,b) always, because marginal count ≥ joint count. Higher-order patterns never contribute under max. Three proposed solutions: (1) source support encodes specificity (min as discount), (2) conditional log-probability weights, (3) evidence subtraction. Key open question: where does the discount come from in the UM?
- multi-freq.pdf (v3) — Multi-Frequency Recurrence in the Universal Model (8 sections). New in v3: Four negative results—OS-conditional (+0.005), line-position (+0.004), skip-bigrams (+0.002), after-period (negligible). Complementarity principle: gains require independent information, correlated features just redistribute weight. Key finding: structural LPPs are the efficient representation of skip-pattern statistics. Solo skip-bigrams match word-onset (+0.13 bpc) but are completely redundant when word+tag present. Next requires features orthogonal to tag and word structure.
- total-interpretability.pdf — Toward Total Interpretability of a Tiny Elman RNN (9pp, updated 19 Feb). Sparse differentiation succeeds. Carrier/pattern decomposition: 86% of neurons carrier-determined (timing), 14% pattern-sensitive (byte identity). Carrier-only: 0.40 bpc, actual: 0.08 bpc. Jacobian × pattern deviation: signed influence +1.75 (offset 0), +0.78 (offset 1), ~0 beyond. Three failed approaches: backward gradient (flat 2%/offset), ablation (cascade), decisive chains (break at step 1–2). Factor map’s offset-7 = timing correlation, not causal byte propagation. Model is a “timing machine with local pattern matching.”
- surprise-mechanism.pdf (v3) — The Surprise Mechanism: Generalized Oversupport and Attribution (12 sections). New in v3: Generalized oversupport (same as context-events v3). UM self-similarity: every subnet is a UM, every incoming pattern is “sensory input”—observation available everywhere, not equally reliable. Surprise attribution not interpretation: we don’t know what surprise means, response is firstly “why am I surprised?” Startle/settling process. Ring pattern: e_a → e_b → e_support gives second-highest support = oversupport metric, embeds surprise detection into P. Research direction. (v2, v1)
(source)
Interactive Viewers
- um-connectome.html — UM Connectome: Pure UM 3D Viewer. Pure UM engine in JavaScript—events are neurons, patterns connect them, forward pass is max-min, learning is counting (omega_0). No hashing, no KN smoothing—everything is SN-representable. 3D ring visualization: purple byte_input ring (256 neurons) and blue byte_output ring (256 neurons). Dormant neurons always visible as dim dots; active neurons glow with support-proportional size. LPP connections as bezier curves. Side panel with SN state export, output distribution, and model statistics. Loads 64K enwik9 chunk. Drag to rotate, scroll to zoom, arrow keys to step, Space to play.
- um-explorer.html — UM Explorer: Neuron-First KN-6 Viewer. Step through 2048 bytes of enwik9 byte-by-byte, seeing the model’s internals in real time. Event space panel shows all 8 ESs with prediction distributions, per-order KN contributions, and LPP flow visualization. Surprise sparkline with green-to-red gradient. Color-coded data ribbon (tag/attr/text/whitespace). Context state indicators. Oscilloscope aesthetic. Arrow keys to step, Space to play, click to jump. Embedded trace data from
umr trace.
- context-events-explainer.html — Context Events Interactive Explainer. Six sections: (1) Color-coded enwik9 data grid showing structural roles, (2) KN-6 sliding window limitation with interactive slider, (3) Context event tracks (in_tag, word_len, after_eq) as aligned visualizations, (4) Boolean truth table mapping context events to predictions, (5) Six-step Boolean-to-P-program methodology, (6) Five open P-programming issues (conditional LPPs, discovery, reuse, surprise, tiny memories).
- um-viewer.html — UM Viewer: SN Model Visualization. Loads any .sn model file and raw data bytes. Implements the same UM forward pass as the C runner (#umr_core): fp(t)j = maxi min(ti, pij), with log-stochastic counting for omega_0. 3D ring visualization (one ring per ES), LPP connections as bezier arcs, byte ribbon, output distribution, live SN state view. Model picker with 6 pre-built models: bigram (1K/4K/16K/64K) and marginal (4K/64K). Data chunks at 1K/4K/16K. Local file picker for custom models.
Three-frequency model: KN-6 + word-onset + tag-onset LPPs. 1M: +0.184 bpc (2.398 → 2.214). 10M: +0.133 bpc (2.178 → 2.044). Gain shrinks at scale as KN-6 absorbs structural information with more data. Weights shift: tag 11.0% → 4.4%, word 3.6% → 1.5%, KN 85.3% → 94.1%. But +0.133 at 10M still substantial—structural LPPs provide genuinely complementary information that KN-6 cannot fully capture even with 10× more data.
Ring pattern oversupport (1M bytes, v3): First raw s(2) from HT counts. Per-order sparsity gradient: ord6 decisive (51% s(2)=0), ord1 always confused (95% s(2)≥64). Raw s(2) and p(2) nearly uncorrelated (r=0.047)—normalization destroys information. Right metric: s(2) at highest active order, not max across orders. In-tag raw: 1660 vs 3096 (1.87×). 91% from-the-left disagreement. 6.8% confidently wrong, 14.1% confused but right.
Word-frequency analysis (1M bytes): Boundaries 1.93× harder than interiors (3.98 vs 2.06 bpc). Position-in-word surprise: pos 0=3.98, pos 3=1.73 (minimum), then slowly climbs. Lowercase words easiest (1.75 bpc interior), digits hardest (3.44). Mean word length 5.7 bytes. Top word onsets: space (9.8%), 't' (8.5%), 'a' (7.4%). Validates multi-frequency architecture: word-onset clock captures structure invisible to byte-level KN-6.
Surprise analysis (1M bytes): in_tag is the most powerful single context event: 0.38 bpc inside tags vs 2.46 bpc outside (86.8% of mean). Word boundaries (word_len=0) are hardest at 3.98 bpc. 40.7% of positions under 0.25 bpc. Heavy tail: 8% above 7.75 bpc. Top surprise positions at tag boundaries, rare markup (51–54 bits).
Experiments
umr surprise enwik9 1000000 — KN-6 surprise histogram + context event analysis. Surprise by in_tag (0.38 vs 2.46 bpc), by word_len (3.98 at boundaries, 1.73 at len 3), top 20 most surprising positions. First concrete experiment from the surprise-mechanism paper.
umr oversupport enwik9 1000000 — Ring pattern oversupport detection. Output-ES second-highest probability, from-the-left order disagreement, context-conditional analysis, surprise×oversupport joint distribution. First empirical validation of the ring pattern construction.
umr export enwik9 1000000 — Sparse LPP export: per-order HT analysis. Order 1: 195 contexts, 29.7 outputs each. Order 6: 297K contexts, 1.4 outputs each. Total: 756K sparse triples (170× smaller than HT capacity). Validates sparse-matrix storage proposal.
umr word-freq enwik9 1000000 — Word-frequency analysis. Boundaries 1.93× harder (3.98 vs 2.06 bpc). Position-in-word curve: pos 0=3.98, pos 3=1.73, climbs slowly. Lowercase words easiest (1.75 bpc), digits hardest (3.44). Mean word length 5.7. First empirical support for multi-frequency architecture.
umr word-lpp enwik9 1000000 — Word-onset LPP: +0.118 bpc gain. First multi-frequency LPP in umr. (word_onset, pos_in_word) → byte_output, learned mixing weight α=−2.3 (w=0.091). KN-6 alone: 2.398, with word-onset: 2.279. Word-onset alone: 6.73 bpc (complementary, not standalone). 30K HT entries (2.9% of 1M HT).
umr tag-lpp enwik9 {1M,10M} — Three-frequency model: +0.184 bpc (1M), +0.133 bpc (10M). KN-6 + word-onset + tag-onset (in_tag, prev_byte) → byte_output. 3-way softmax mixing with online gradient. 1M weights: kn=85.3%, tag=11.0%, word=3.6%. 10M weights: kn=94.1%, tag=4.4%, word=1.5%. Gain shrinks at scale but remains substantial.
umr cond-lpp enwik9 1000000 — OS-conditional LPP: +0.005 over 3-way (marginal). Context=(os_bucket, in_tag, prev_byte) subsumes tag LPP. 4-way weights: os=10.4%, tag drops to 1.4%. Negative result: oversupport bucketing adds negligible information beyond tag+byte context. Surprise is useful for attribution not directly for prediction improvement.
umr line-lpp enwik9 1000000 — Five-frequency model: +0.004 over 3-way (marginal). Line-pos gets 9.4% weight but steals from tag (drops to 1.5%). After-period: 0.7% weight. Negative result: line position and tag status are highly correlated in enwik9—not independent information sources.
umr skip-lpp enwik9 1000000 — Skip-bigrams at d=8,16,32. Solo: d=8 gives +0.129 (comparable to word-onset +0.118). But combined with word+tag: only +0.002 marginal. Key insight: skip-bigrams capture the same non-contiguous structure as word/tag LPPs—structural features are the efficient representation of what skip-patterns provide.
umr tagname-lpp enwik9 1000000 — Tag-name LPP: +0.008 over 3-way (first positive marginal). Context=(tag_name_hash, in_tag, prev_byte). Tag-name (10.9%) subsumes basic tag (drops to 0.6%). Total: +0.192 bpc. Tag name is genuinely orthogonal to in_tag alone—content inside <text> vs <title> vs <comment> has different distributions. HT: 8.9K entries (0.8%).
umr attr-lpp enwik9 1000000 — Attribute-value LPP: −0.001 over 4-way. Negative result: only 0.2% of 1M positions are inside attribute values (1,775 positions)—too few to learn from. Attr surprise very low (0.64 bpc vs 2.40 outside) but negligible coverage. Gets 0.6% weight. Data sparsity, not information content, is the bottleneck.
umr raw-os enwik9 1000000 — Raw s(2) from HT counts. First true ring-pattern measurement. Per-order mean s(2): ord1=3052, ord2=340, ord3=69, ord4=27, ord5=14, ord6=6. Raw s(2) and p(2) nearly uncorrelated (r=0.047)—validates MJC’s point that normalization destroys information. Order-6: 50.8% have s(2)=0 (decisive), order-1: 94.5% have s(2)≥64 (always confused). In-tag: 1660 vs 3096 (1.87× ratio). The right metric is per-order s(2) at the highest active order.