← Back to archive

The Tock Step

Domain-native architecture from evidence. Learning what the event spaces ARE, not just the patterns between them.

The key equation
tock step = next Ek+1 + maps to/from existing {Ei}
Ei
Architecture = event spaces
+ maps between them
MI
Evidence-driven discovery
No gradient descent needed
π-1
Backtracking through
the factorization tower
!
Strawberry theorem:
tokens can't count chars

1. What Is the Tock Step?

In the CMP tick-tock cycle, the tick step uses the current model to process data (forward pass, counting, prediction). The tock step steps back and asks: are we using the right event spaces?

The tock step is architecture discovery from evidence. It finds the next event space Ek+1 that would most reduce prediction error, then builds the maps connecting it to existing event spaces. No training. No hyperparameter search. Just counting.

Neural NetworkUniversal Model
Architecture (fixed before training)Event spaces {Ei} (fixed)
Parameters (learned by SGD)Patterns {pij} (learned by counting)
Hyperparameter/architecture searchTock step (discover next Ei from evidence)
The architecture IS the event spaces. What events can you distinguish? What questions can you ask? The architecture determines what's representable. Everything else is parameters.

2. The Four Steps of a Tock

1
Identify the gap
Where does prediction fail? Which positions, contexts, outputs are most surprising?
bpc, cross-entropy
2
Search for Ek+1
Find a new event space that maximally reduces residual error. Search over factorizations.
MI with output
3
Learn the maps
Compute patterns pi,k+1 connecting the new ES to existing ones. Just counting.
ω0 (count tables)
4
Update architecture
Add Ek+1 and its maps. The UM grows by one event space.
Ak+1
Every step is evidence-based. The gap is computed from data. The search uses MI from co-occurrence counts. The maps are log contingency tables. No beliefs (axioms), no abductions (pattern commitments). Pure evidence.

3. Domain-Native Tock Sequence

Starting from raw bytes and iteratively discovering the most informative event space:

BPC Improvement with Each Tock

Each tock discovers a new ES that reduces prediction error. A0 (bytes only) = unigram at 5.0 bpc. A4 (offset conjunctions) approaches 1.0 bpc. Compare: the sat-rnn at 2.81 bpc sits between A2 and A3.
StepArchitectureNew EiDiscovery methodBPC
A0Bytes onlyE0 = {0..255}Given~5.0
A1+ bigram ESE0 × E0Adjacent byte MI~3.5
A2+ word boundary{0, 1}Space char has highest MI~3.0
A3+ tag state{0, 1}< > create high-MI partition~2.5
A4+ offset conjunctions2-offset product ESesFactor map / backward trie~1.0
The experimental program IS the tock sequence. Backward trie discovers MI-ranked offsets (Feb 7). Factor map discovers 2-offset conjunctions (Feb 9). ES discovery finds SVD event clusters (Feb 12). Weight construction builds maps from count tables (Feb 11). Each is a tock step, performed manually. The goal is to automate it.

4. The Factorization Tower

Event spaces form a tower from fine to coarse, connected by surjections (quotient maps). Each level discards within-class structure.

E0 Bits |E| = 2
π1: 8 bits → 1 byte
E1 Bytes (characters) |E| = 256
π2: byte → class
E2 Character classes (vowel, consonant, digit, ...) |E| ≈ 10
π3: char seq → token
E3 Subword tokens (BPE) |E| ≈ 50k
π4: tokens → word
E4 Words |E| ≈ 100k

Product factorization

E = Ea × Eb

Both components independently accessible. I(E) = I(Ea) + I(Eb).

Example: h = (h1, ..., h128) decomposes into 128 binary ESes. Each bit is independently readable.

Preserves access. No backtracking needed.

Sum (sequential/quotient) factorization

Efine ↠ Ecoarse

Coarse event is a function of fine. Fine structure is inside the coarse event.

Example: "straw" is a token. The characters s,t,r,a,w are inside it, inaccessible at the token level.

Hides structure. Backtracking required to recover.
The tock step should prefer product factorizations when possible, because they preserve access to all components. Sum factorizations (like tokenization) hide structure and require costly backtracking.

5. The Strawberry Theorem

Interactive Demo: Counting Characters in Tokens

Word:
Count this char:
Token level (E3):
Token sees:
π3-1 backtrack:
Count:
The Strawberry Theorem
P(model answers correctly) ≤ P(answer is retrievable from EV)
This is not a failure of scale or training. No amount of data or compute can give a token-level model access to character-level structure. The fix is not a "better tokenizer" but a tower: maintain multiple levels simultaneously, with the ability to move between them as the task demands.

The strawberry problem generalizes to any task requiring finer resolution than the model's operating level:

TaskRequiresToken level hasFix
Count chars in a wordE1 (characters)E3 (tokens)Backtrack π3
Count syllablesPhoneme-level ESE3 (tokens)Backtrack to phonemes
Syntactic role of "the"Word-role ESPositional onlyBacktrack to parse
Does A∧B imply C?Proposition-level ESToken sequenceBacktrack to logic

6. Backtracking Through the Tower

Cost of Backtracking vs Benefit of Resolution

Backtracking from tokens to bytes increases sequence length ~4x but gives access to character-level structure. The domain-native UM pays this cost only when needed.
Backtracking recovers lost information: I(after backtrack) = I(Ek) + H(fiber | context). If context determines the fine event exactly, backtracking is free. If context gives no information, it costs the full within-class entropy.

Three strategies compared:

StrategyCostCan solve fine tasks?
Always finest level (char-by-char)High (long sequences)Yes
Always coarsest level (token-by-token)LowNo
Tower with backtrackingLow (usually) + high (when needed)Yes

7. Architecture from Evidence vs. Belief

Neural Architecture Search

Search space
Hyperparameters
Objective
Validation loss
Method
Grid / random / Bayesian
Cost
Train many models
Result
Architecture (opaque)
Backtracking
Not applicable
Source
Belief (prior commitment)
vs

Domain-Native Tock

Search space
Event spaces
Objective
MI with output
Method
Counting + quotients
Cost
Count once
Result
Architecture (interpretable)
Backtracking
Built in (tower)
Source
Evidence (data counts)
The three sources of support for architecture choices:
Evidence (the tock step): count co-occurrences, compute MI, discover Ei. All from data.
Belief (NAS): "use 128 hidden neurons" or "BPE with 50k vocab." Prior commitment, not domain-specific.
Abduction: seeing the backward trie pattern and recognizing "offset 7 captures word-initial context." Short-circuits induction by understanding why.

8. The Tick-Tock Closed Loop

INIT
A0
bytes only
TOCK
Ek+1
discover ES
from MI
TICK
predict
forward pass
update counts
EVAL
gap?
if error high
→ tock again
No gradient descent anywhere. Tock uses MI analysis. Tick uses counting (ω0). Eval uses the forward pass (f). All operations are O(N) in data size. The Feb 11 weight construction demonstrated this: all 82k parameters from data statistics, 1.89 bpc with zero SGD (vs 4.97 for trained).

9. Tock Steps in the Experimental Program

Every Discovery Tool Is a Tock Step

MethodWhat it discoversNew EiArchive
Backward trieMI-ranked offsetsOffset conjunctionsFeb 7
Skip-k-gram analysisInformative offset pairs2-offset ESesFeb 8
Factor mapNeuron → conjunctionDomain featuresFeb 9
ES discovery (SVD)Skip-bigram SVDEvent clustersFeb 12
Weight constructionShift-register groupsHash-based ESesFeb 11
Each method discovers a new event space from data statistics, then builds maps connecting it to existing architecture. This is the tock step, performed manually. The goal is automation.

The MI Criterion: Greedy Offset Selection

The backward trie ranks offsets by MI with the output. Greedy selection: d=1 first, then d=8, then d=2, then d=7... This matches the empirically discovered skip-pattern order exactly.

10. Why Current LLMs Can't Backtrack

Modern LLMs make an irrevocable architectural choice at tokenization. The factorization π: chars → tokens is fixed before training and cannot be undone at inference time. The model operates at the token level, period. Every task requiring character-level resolution must be handled by memorization or heuristics.
Token-level models lose character structure irrecoverably. The tower maintains all levels, paying the cost of backtracking only when needed.
The fix is not a better tokenizer. It's a tower: maintain multiple levels of the factorization simultaneously, with the ability to move between them as the task demands. Normally operate at whatever level is efficient (E3 or E4). When a task requires finer resolution, backtrack to the appropriate level.
The strongest reading of CMP's equivalence thesis
Interpretability and efficiency are the same problem because both are solved by the evidence-driven factorization. The domain-native tock makes this constructive.

Navigation

Claude and MJC · February 12, 2026 · The Tock Step: Domain-Native Architecture from Evidence