← Archive 2026-02-18

Context Events

How missing priors become missing events in the Universal Model
Claude and MJC — 18 February 2026 — Hutter Prize Compressor
Section I

The Data

The first 256 bytes of enwik9 — a MediaWiki XML export. Every character has a structural role that a 6-byte window cannot see.

Tag bracket Tag name Attr name Attr value Text content Whitespace
Section II

What KN-6 Sees

KN-6 predicts each byte using only the preceding 6 bytes as context. Drag the slider to move through the data. The amber window is the context; the red byte is the prediction target.

6
Context (6 bytes)
Target byte
Structural role
KN-6 can see role?
KN-6 achieves 1.784 bpc on enwik9. It cannot distinguish “inside a tag” from “in text content” unless the 6-byte window happens to contain a < or >. Most of the compression gap is structural information outside the window.
Section III

Context Events

Three context events computed over the first 256 bytes. Each track shows one event’s value at each position. These carry structural information that the sliding window drops.

The in_tag event alone partitions the byte distribution into two very different worlds: tag syntax vs. text content. Word length tracks position within tokens. after_eq predicts the start of attribute values. None of these require more than a single bit of state — yet KN-6 cannot represent any of them.
Section IV

The Boolean Interpretation

The UM forward pass (max-min) reduces to disjunctive normal form when supports are binary. Each row is a conjunction (AND); the model takes the strongest match (OR/max).

in_tag word_len after_eq Prediction Positions
1 0 0 tag-name starter (a–z)
1 >0 0 tag-name / attr-name continuation
1 1 quote char (")
1 >0 0 URL / attr-value chars
0 0 0 word starter or <
0 >3 0 word continuation (a–z)
This is the probabilistic syllogism: “When in_tag AND word_len=0 are both supported, then tag-name starters are supported.” Each conjunction adds one context event to the left of the pattern chain. The max-min forward pass finds the strongest applicable syllogism.
Section V

The Methodology

A six-step procedure for turning compression residuals into P-programs.

Identify the Residual

Find positions where KN-6 is surprised: bits-per-byte > 4. In enwik9, these cluster at tag boundaries (< after text), attribute starts (space after tag name), and content transitions (text after >).

Boolean Interpretation

Express the pattern as Boolean logic: “the model predicts x but the data shows y whenever we cross a tag boundary.” The condition is a single bit of state.

Natural Language

State it plainly: “We are inside an XML tag.” This names the context event. Other examples: “The current word is longer than 5 characters.” “We just saw an equals sign.”

Reify via E/N/Q

Create an event space: ES_structure = {in_tag, in_text, in_attr}. Each event is an element of the integer quotient (um-arithmetic-v4). The projection from byte positions to this ES is a ring surjection.

Query the Dataset

Verify empirically: P('<' | in_text) = 0.003 vs P('<' | in_tag) = 0.0001. The context event separates very different distributions. Count matching positions to confirm the feature is real.

P-Program

Implement the detector as a UM P-program: a tiny nested model that reads bytes and emits in_tag / in_text events. Connect its output ES to the main model’s context input. The max-min forward pass does the rest.

Section VI

P-Programming Issues

Open questions that need resolution before context events can be implemented in the UM runner. These are the frontier.

Conditional LPPs in SN Notation design

An LPP between ES_a and ES_b currently records all joint events unconditionally. A conditional LPP should only record joint events when a context event in ES_c is active.

In the max-min framework, this happens naturally: the context event’s support propagates through patterns to ES_a, restricting which events have support. But the LPP’s hash table still records everything — it cannot distinguish “this count was accumulated while in_tag was active” from the rest.

Possible approaches: (1) Separate LPPs per context event (space explosion). (2) Include the context event in the hash key (extends the “context” without extending the byte window). (3) Weighted counting where the context event modulates the count increment.

This is the core P-programming question. How the UM runner handles conditional counting determines whether context events actually improve compression or just add overhead.

Automatic Discovery from Residuals research

The methodology assumes a human identifies the context event (“we are inside a tag”). For P-programming at scale, we need automated discovery.

Approach: after scoring with KN-6, record per-position surprise (bits). Cluster high-surprise positions. For each cluster, find the minimal Boolean feature that separates it from low-surprise positions. This feature IS the context event.

The backward trie (20260208 archive) already does something like this for skip-patterns. The question is whether the same machinery can discover structural context events (tag boundaries, word lengths) rather than just sequential ones (n-gram suffixes).

Neural Reuse: Same LPP, Different Contexts open

The sat-rnn reuses its 128 neurons across all contexts. The factor map showed that every neuron is a 2-offset conjunction detector, and context (word_len, in_tag) modulates which conjunctions are active.

In the UM, this means the same LPP must produce different output distributions depending on which context event is active. The max-min pass naturally handles this — but only if the context events connect to the right subset of ES_a events.

The reuse problem: how many context events does a single LPP need? The trained RNN implicitly uses word_len (continuous, 0–20+) and in_tag (binary). That’s ~42 effective contexts. With 256 output bytes, this means the LPP must maintain ~42 × 256 = 10,752 conditional distributions. Is this tractable in a hash table?

Surprise: Undersupport and Oversupport open

Undersupport: no context event fires. The model has no structural information and falls back to the unconditional LPP. This is exactly the current KN-6 situation — fine as a default, but a signal that a context event is missing.

Oversupport: the wrong context event fires. The induced prior is wrong, and the model is more surprised than it would be without the context event. This is actively harmful.

Detection: compare per-position surprise with and without the context event. If surprise increases when the context event is active, the event is wrong for that position.

Response: oversupport is a learning signal. The context event’s connections need refinement — either it should have lower weight for the mismatched positions, or a new context event should split the current one into sub-cases.

This connects to ω: the learning function should detect surprise and update the model (add events, adjust weights). Currently the UM runner has no surprise mechanism at all.

Tiny Memories and State Across Bytes design

Context events like in_tag require state: the model must remember whether it has seen a < without a matching >. In the sat-rnn, this state lives in the hidden vector. In the UM, it must live in an event’s support value.

The carrier signal approach: an event maintains nonzero support across multiple time steps, decaying or switching based on input. The P-program for in_tag: set support to 255 on <, set to 0 on >, carry forward otherwise.

But this requires the UM runner to (a) maintain persistent support across forward passes, and (b) allow P-programs to write back to their own ESs. Currently, supports are reset each byte. The “tiny memory” is a P-program with internal state — a nested UM instance. How to implement this efficiently is an open design question.