What do the singular values of the bigram matrix mean?
| Component | S | Variance | Spectrum |
|---|---|---|---|
| 0 | 2892.8 | 97.5% | |
| 1 | 306.5 | 1.1% | |
| 2 | 151.9 | 0.3% | |
| 3 | 127.8 | 0.2% | |
| 4 | 108.0 | 0.1% | |
| 5 | 91.8 | 0.1% |
Top 10 capture 99.4% of variance. Top 64 capture 99.9%.
PREV: i o t e a ] vs x01 xFE xFD
NEXT: SP ] a e i r vs xFD xFE xFF
PREV: SP e n r s a vs xC3 xD0 x83 xB8
NEXT: e a i o t s vs xD0 xE0 xE3 xD1
PREV: u i o r a l vs > = : 9 NL 1
NEXT: o l i n u y vs 6 1 T 5 8 9
PREV: 1 5 9 4 8 6 vs [ { ( SP |
NEXT: ] , ; SP < ) vs I S L M R P
PREV: xC3 xD0 xCE xD7 vs SP : ( x84
NEXT: e . u o : y vs xD0 xE0 xE3 xCE
PREV: [ a i o / e vs L C W V H N
NEXT: p c m g f d vs o e a ] i .
Rank 1: MSE = 3.31
Rank 2: MSE = 1.88
Rank 4: MSE = 1.28
Rank 8: MSE = 0.83
Rank 16: MSE = 0.60
Rank 32: MSE = 0.38
Rank 64: MSE = 0.16
Component 0 (frequency baseline) captures 97.5% of variance. The interpretable structure — encoding type, text vs markup, bracket matching, phonotactics — lives in the remaining 2.5%.
When we inject rank-64, we keep all of this structure and lose only noise.
← Back to Archive