Bag-of-letters embeddings as the first change of base. What "the" is, looking down toward orthography. The word event as support addition on letter events. LATD: Look At The Data. The Tick-Tock process from bytes to words.
The Path to the Lexicon
Bag-of-letters embeddings and the first change of base. What "the" is: support on letter unigrams, bigrams, trigrams, and length. Orthographic (looking down), not semantic (looking up). The reset signal (space), residual for ambiguous spellings, change of base from 109 byte positions to ~1.4×108 word positions. First Tock of the Tick-Tock process.