The Scaling Story:
Data Is All You Need
With joint conditioning and more data, performance improves steadily. Our n-gram model already approaches 2.0 bpc on Wikipedia.
Test Performance vs Data Size (n-gram KN, order 5)
5.04
Marginal entropy (no model, just byte frequencies)
2.29
Best test BPC at 10M bytes — and still improving
<2.0
Projected at 100M bytes with larger hash table