fig-20260131-04: Hypothesis Test Results

Testing character class hypotheses against trained RNN (6.02 bpc)

Hypothesis Characters Within-class Similarity Cross-class Similarity Result
H1: Vowels a, e, i, o, u 0.988
0.986 ✓ SUPPORTED
H2: Consonants t, n, s, r, h 0.984
0.986 ✗ NOT SUPPORTED
H4: Punctuation . , ! ? 0.999
0.986 ✓ SUPPORTED
H5: Digits 0, 1, 2, 5, 9 0.9996
0.987 ✓ SUPPORTED

Key Insights

1. Digits form the tightest cluster (0.9996 similarity) — the RNN treats all digits nearly identically
2. Punctuation is highly uniform (0.999) — learned as a single event space
3. Vowels cluster together (0.988 vs 0.986 cross-class) — weak but present
4. Consonants are diverse — no single ES; may need subgroups (stops, fricatives, etc.)

Interpretation

Even an undertrained model (6.02 bpc) has learned basic character categories discoverable from bigram statistics. This confirms the hypothesis that short Markov chains → early event spaces.

The failure of H2 (consonants) suggests consonants don't form a single ES — they may split into subgroups based on phonetic properties or positional statistics.

Reproducibility

Repository: https://github.com/TBD/hutter (not yet public)

Model:

Data:

Training:

To reproduce:

wget http://mattmahoney.net/dc/enwik9.zip && unzip enwik9.zip
make
./hutter predict "a" model.bin   # Get hidden state for each char
python3 test_hypothesis.py       # Run hypothesis tests