Word-level patterns— ~0.10 bpc. Word absorption, word bigrams. Requires tokenization bridge to extract at byte level.
Context mixing + neural— ~0.62 bpc. What cmix does: adaptive mixing of PPM, LSTM, match model, indirect models with neural mix weights.
4 SN Format
An SN model is a list of events and patterns, each with an integer strength.
The format is human-readable and fully explicit.
Events (unigram strengths):
"The output is ' '."22. /* space is the most common byte */ "The output is 'e'."19. /* 'e' is second */ "The output is 'a'."19.
Patterns (conditional strengths):
"The input is ' '.""The output is 't'."17. "The input is 't'.""The output is 'h'."16. "The input is 'h'.""The output is 'e'."16.
Strength = ⌊log₂(count)⌋. A strength of 17 means the pair appeared 217 – 218 times (~131K–262K).
Each pattern costs about 80 bytes of description. This is the fundamental trade-off: more patterns improve prediction but cost description length.
5 The Description Length Trade-off
Adding patterns improves prediction (lowers bpc) but each pattern costs ~80 bytes.
The optimal model minimizes total description = compressed data + model description.