ES-Conditional Predictions

fig-20260131_2-07 | Model predictions reveal learned ES transition structure

Key Finding: ES-Pairs Have Structure

The augmented model (1M chars training) has learned strong ES transition patterns that match our tock2 predictions:

After "the "

Whitespace → expecting word start

Other79.0%
Vowel19.2%
Digit1.6%
'w' 10.4% · 'f' 8.9% · 't' 6.3% · 's' 6.0%

After "123"

Digit → expecting number end

Punct53.0%
Whitespace20.4%
Other16.8%
Digit9.2%
',' 39.6% · ' ' 18.2% · '.' 11.7%

After "."

Period → almost certain whitespace

Whitespace99.0%
Other0.6%
' ' 98.5% · '\n' 0.5%

After "a"

Vowel → consonants (in-word)

Other73.5%
Whitespace20.7%
Vowel4.9%
't' 32.0% · ' ' 20.7% · 's' 10.6% · 'l' 10.5%

After "e"

Vowel → mixed (word-final 'e' common)

Other53.6%
Whitespace40.9%
Vowel4.1%
' ' 40.9% · 't' 14.1% · 'n' 10.7% · 's' 7.7%

Comparison: 'a' vs 'e'

Within-ES distinctions matter

'a''e'Δ
Other73.5%53.6%-19.9
Whitespace20.7%40.9%+20.2

'e' is often word-final ("the", "be"), so higher whitespace probability. 'a' rarely ends words, so higher consonant continuation.

Implication for Tock-2

These results confirm that ES-pairs (prev_ES, next_ES) have more predictive structure than individual ESs alone. For tock-2, we should consider:

The 53% improvement comes partly from making these ES-level transitions explicit, freeing capacity for within-ES distinctions.

Reproduction

Command: ./hutter predict-aug "context" models/aug_epoch1.bin

Model: Augmented RNN, 128 hidden, 1M chars training