← Back to Archive

Vowel Embeddings via Product Space

Each vowel embedded into 7D via P(consonant-group | vowel)

Construction: Product space Vowel × CGroup (5×7 = 35 pairs). Count transitions vowel→consonant in enwik9. Normalize rows to get embeddings.

Embedding Vectors (7D probability distributions)

Distance Matrix (L2)

Clusters

(a, o) = 0.102 (a, u) = 0.113 (e, u) = 0.131

Interpretation:
• (a, o): Both favor nasals + liquids ("an", "on", "al", "ol")
• (e, u): Both high liquid ("er", "ur", "el", "ul")
• i is unique: Very high nasal ("in", "im"), almost no glide

Information

H(cgroup) = 2.55 bits
H(cgroup | vowel) = 2.48 bits
I(cgroup; vowel) = 0.07 bits

The embedding captures 2.8% of consonant-group entropy.