The Combination Gap: 4-Policy Ablation

Order-6 n-gram UM on enwik9 (100K + 1M + 10M). Same events, same LPPs, six scoring rules. One pass.

Scoring Curves

Each line shows cumulative bits-per-character as the model processes data. All four policies observe the same events and learn the same LPP entries — they differ only in how they combine per-order distributions into a prediction.

max-min (tropical forward pass)

sharpest-LPP (highest s1-s2 gap)

KN-interp (KN-discount, D=1.0)

ent-blend (entropy-weighted)

gap-blend (tropical, 2^gap)

2^(g/H) (normalized conviction)

Results (10M bytes)

Policy	bpc	Gap vs KN	Track
max-min	5.332	+3.125	U baseline
sharpest-LPP	4.037	+1.830	U (selection)
KN-interp	2.207	0.000	C
gap-blend	2.149	-0.058	tropical
2^(gap/H)	2.158	-0.049	normalized conviction
ent-blend	2.135	-0.072	compromise

The 2^(gap/H) policy normalizes gap by entropy. In the current source-backed hybrid paper, it reaches 2.239 at order 6 and 2.189 at order 8. That means only a 0.005 bpc gain over gap-blend at order 6, but a 0.105 bpc gain at order 8 (2.189 vs 2.294). See hybrid-blend.pdf.

Order sweep: order 4 still favors ent-blend (2.424 vs H3 2.479), while H3 leads gap-blend at orders 6 (2.234 vs 2.241) and 8 (2.183 vs 2.291). The archive is holding “H3 is the leading higher-order hybrid so far” because the order-4 gap remains.

Resolution Criteria

Track U: "Selection closes the gap"

Threshold: sharpest-LPP closes >=80% of max-min-to-KN gap

Result: closes 41.4% (needs 80%)

FAILS. Selection alone does not close the gap.

Track C: "Interpolation is needed"

Threshold: sharpest-LPP leaves >=1.0 bpc residual gap to KN

Result: residual = 1.830 bpc

PASSES. The gap requires changes beyond selection.

Compromise: "Entropy-blend matches KN"

Threshold: ent-blend within 0.1 bpc of KN-interp

Result: ent-blend beats KN by 0.072 bpc

PASSES. The compromise policy is the best policy.

What This Means

At 10M, ent-blend is the clear winner (-0.072 vs KN). Gap-blend and H3 are close (-0.058, -0.049) but the ordering reversed from 1M: at 1M, H3 > gap-blend > ent-blend; at 10M, ent-blend > gap-blend > H3. The conviction advantage fades at scale.

Sharpest-LPP improved dramatically (41.4% vs 8.4% at 1M) but still fails the 80% threshold. The tropical tax is 3.125 bpc at 10M (vs 2.928 at 1M), growing as sharpest-LPP benefits from more neurons.

Reproduce: ./umr ngram-ablation enwik9 6 10000000 (with ./run-experiment wrapper)

Combination Problem Paper (PDF) Raw Data (JSON) Archive Index