Archive: 20260131 Status: FROZEN (do not modify - see #agglutinative for protocol) == SUMMARY == First batch of interpretability experiments. Key result: 5 ESs (Digits, Punct, Vowels, Whitespace, Other) explain 59% of model compression (1.37 of 2.31 bits/char). == REPRODUCIBILITY == Repository: /home/admin/hutter Model: model.bin (Elman RNN 256->128->256) Performance: 5.69 bpc on 100K sample Data: enwik9 (http://mattmahoney.net/dc/enwik9.zip) To reproduce: wget http://mattmahoney.net/dc/enwik9.zip && unzip enwik9.zip make ./hutter eval enwik9 model.bin ./hutter es enwik9 model.bin == CONTENTS == Papers (source + PDF): rnn-um-mapping.tex/pdf - Doubled-E: 0% bpc difference activation-probing.tex/pdf - Query language for hidden state To rebuild: pdflatex rnn-um-mapping.tex && pdflatex rnn-um-mapping.tex Figures: fig-20260131-01: theories.html (pattern validation) fig-20260131-02: viz.html (activation heatmaps) fig-20260131-03: dashboard.html (main dashboard) fig-20260131-04: hypothesis-results.html (ES hypothesis tests) fig-20260131-05: coactivation-test-20260131.txt (neuron persistence) fig-20260131-06: sn-view.html (SN representation) fig-20260131-07: sn-layered.html (layered ES graph) Data: model.bin (RNN checkpoint) model.sn (Sparse Network representation) == KEY FINDINGS == 1. ES Evaluation (fig-07, ./hutter es): - RNN: 5.69 bpc, ES-only: 6.63 bpc - 5 character classes explain 59% of compression - Remaining 0.94 bits/char from byte-specific patterns 2. Character Class ESs (fig-04): - Digits, Punctuation, Vowels confirmed as ESs - Consonants too diverse for single ES 3. Persistent Neurons (fig-05): - h[34], h[48], h[62], h[71], h[113] active across sequences == NEXT == New experiments go in archive/20260131_2/