The first 256 bytes of enwik9 — a MediaWiki XML export. Every character has a structural role that a 6-byte window cannot see.
KN-6 predicts each byte using only the preceding 6 bytes as context. Drag the slider to move through the data. The amber window is the context; the red byte is the prediction target.
Three context events computed over the first 256 bytes. Each track shows one event’s value at each position. These carry structural information that the sliding window drops.
in_tag event alone partitions the byte distribution into two very different worlds: tag syntax vs. text content. Word length tracks position within tokens. after_eq predicts the start of attribute values. None of these require more than a single bit of state — yet KN-6 cannot represent any of them.The UM forward pass (max-min) reduces to disjunctive normal form when supports are binary. Each row is a conjunction (AND); the model takes the strongest match (OR/max).
| in_tag | word_len | after_eq | Prediction | Positions |
|---|---|---|---|---|
| 1 | 0 | 0 | tag-name starter (a–z) | |
| 1 | >0 | 0 | tag-name / attr-name continuation | |
| 1 | — | 1 | quote char (") | |
| 1 | >0 | 0 | URL / attr-value chars | |
| 0 | 0 | 0 | word starter or < | |
| 0 | >3 | 0 | word continuation (a–z) |
A six-step procedure for turning compression residuals into P-programs.
Find positions where KN-6 is surprised: bits-per-byte > 4. In enwik9, these cluster at tag boundaries (< after text), attribute starts (space after tag name), and content transitions (text after >).
Express the pattern as Boolean logic: “the model predicts x but the data shows y whenever we cross a tag boundary.” The condition is a single bit of state.
State it plainly: “We are inside an XML tag.” This names the context event. Other examples: “The current word is longer than 5 characters.” “We just saw an equals sign.”
Create an event space: ES_structure = {in_tag, in_text, in_attr}. Each event is an element of the integer quotient (um-arithmetic-v4). The projection from byte positions to this ES is a ring surjection.
Verify empirically: P('<' | in_text) = 0.003 vs P('<' | in_tag) = 0.0001. The context event separates very different distributions. Count matching positions to confirm the feature is real.
Implement the detector as a UM P-program: a tiny nested model that reads bytes and emits in_tag / in_text events. Connect its output ES to the main model’s context input. The max-min forward pass does the rest.
Open questions that need resolution before context events can be implemented in the UM runner. These are the frontier.
An LPP between ES_a and ES_b currently records all joint events unconditionally. A conditional LPP should only record joint events when a context event in ES_c is active.
In the max-min framework, this happens naturally: the context event’s support propagates through patterns to ES_a, restricting which events have support. But the LPP’s hash table still records everything — it cannot distinguish “this count was accumulated while in_tag was active” from the rest.
Possible approaches: (1) Separate LPPs per context event (space explosion). (2) Include the context event in the hash key (extends the “context” without extending the byte window). (3) Weighted counting where the context event modulates the count increment.
This is the core P-programming question. How the UM runner handles conditional counting determines whether context events actually improve compression or just add overhead.
The methodology assumes a human identifies the context event (“we are inside a tag”). For P-programming at scale, we need automated discovery.
Approach: after scoring with KN-6, record per-position surprise (bits). Cluster high-surprise positions. For each cluster, find the minimal Boolean feature that separates it from low-surprise positions. This feature IS the context event.
The backward trie (20260208 archive) already does something like this for skip-patterns. The question is whether the same machinery can discover structural context events (tag boundaries, word lengths) rather than just sequential ones (n-gram suffixes).
The sat-rnn reuses its 128 neurons across all contexts. The factor map showed that every neuron is a 2-offset conjunction detector, and context (word_len, in_tag) modulates which conjunctions are active.
In the UM, this means the same LPP must produce different output distributions depending on which context event is active. The max-min pass naturally handles this — but only if the context events connect to the right subset of ES_a events.
The reuse problem: how many context events does a single LPP need? The trained RNN implicitly uses word_len (continuous, 0–20+) and in_tag (binary). That’s ~42 effective contexts. With 256 output bytes, this means the LPP must maintain ~42 × 256 = 10,752 conditional distributions. Is this tractable in a hash table?
Undersupport: no context event fires. The model has no structural information and falls back to the unconditional LPP. This is exactly the current KN-6 situation — fine as a default, but a signal that a context event is missing.
Oversupport: the wrong context event fires. The induced prior is wrong, and the model is more surprised than it would be without the context event. This is actively harmful.
Detection: compare per-position surprise with and without the context event. If surprise increases when the context event is active, the event is wrong for that position.
Response: oversupport is a learning signal. The context event’s connections need refinement — either it should have lower weight for the mismatched positions, or a new context event should split the current one into sub-cases.
This connects to ω: the learning function should detect surprise and update the model (add events, adjust weights). Currently the UM runner has no surprise mechanism at all.
Context events like in_tag require state: the model must remember whether it has seen a < without a matching >. In the sat-rnn, this state lives in the hidden vector. In the UM, it must live in an event’s support value.
The carrier signal approach: an event maintains nonzero support across multiple time steps, decaying or switching based on input. The P-program for in_tag: set support to 255 on <, set to 0 on >, carry forward otherwise.
But this requires the UM runner to (a) maintain persistent support across forward passes, and (b) allow P-programs to write back to their own ESs. Currently, supports are reset each byte. The “tiny memory” is a P-program with internal state — a nested UM instance. How to implement this efficiently is an open design question.