2026-02-24: The Path to the Lexicon

Bag-of-letters embeddings as the first change of base. What "the" is, looking down toward orthography. The word event as support addition on letter events. LATD: Look At The Data. The Tick-Tock process from bytes to words.

Papers

The Path to the Lexicon
Bag-of-letters embeddings and the first change of base. What "the" is: support on letter unigrams, bigrams, trigrams, and length. Orthographic (looking down), not semantic (looking up). The reset signal (space), residual for ambiguous spellings, change of base from 109 byte positions to ~1.4×108 word positions. First Tock of the Tick-Tock process.
Eight Experiments: The English Context Neuron (v2 iterated)
Revised after two rounds of MJC commentary. UM-native formulations, E→N→Q connections. Exp 2: observation-only LPP. Exp 4: LS subtraction. Exp 5.2: 100-word SN model. Exp 7: < turns English ON.
Previous: v1
English Context Neuron: Experiment Results
Seven experiments + oracle mixing + first lexicon entry + 100-word lexicon (corrected Exp 5.2). Oracle word selection fixes broadcast activation: each word learns its own orthographic projection, not the English marginal. 17% byte coverage, 17,745 LPP entries at 10M. Case variants captured.

See Also

The Memory Trace (20260222)
Origin paper for LATD. §5 defines “Look At The Data”: not aggregate statistics, but the actual cases.
LATD Explainer (20260312)
Interactive three-regime (L→A→T) decomposition. The full realization of the LATD principle introduced here.