← Back to archive

The Tock Step

Domain-native architecture from evidence. Learning what the event spaces ARE, not just the patterns between them.

The key equation

tock step = next E_k+1 + maps to/from existing {E_i}

E_i

Architecture = event spaces
+ maps between them

Evidence-driven discovery
No gradient descent needed

π^-1

Backtracking through
the factorization tower

Strawberry theorem:
tokens can't count chars

1. What Is the Tock Step?

In the CMP tick-tock cycle, the tick step uses the current model to process data (forward pass, counting, prediction). The tock step steps back and asks: are we using the right event spaces?

The tock step is architecture discovery from evidence. It finds the next event space E_k+1 that would most reduce prediction error, then builds the maps connecting it to existing event spaces. No training. No hyperparameter search. Just counting.

Neural Network	Universal Model
Architecture (fixed before training)	Event spaces {E_i} (fixed)
Parameters (learned by SGD)	Patterns {p_ij} (learned by counting)
Hyperparameter/architecture search	Tock step (discover next E_i from evidence)

The architecture IS the event spaces. What events can you distinguish? What questions can you ask? The architecture determines what's representable. Everything else is parameters.

2. The Four Steps of a Tock

Identify the gap

Where does prediction fail? Which positions, contexts, outputs are most surprising?

bpc, cross-entropy

Search for E_k+1

Find a new event space that maximally reduces residual error. Search over factorizations.

MI with output

Learn the maps

Compute patterns p_i,k+1 connecting the new ES to existing ones. Just counting.

ω₀ (count tables)

Update architecture

Add E_k+1 and its maps. The UM grows by one event space.

A_k+1

Every step is evidence-based. The gap is computed from data. The search uses MI from co-occurrence counts. The maps are log contingency tables. No beliefs (axioms), no abductions (pattern commitments). Pure evidence.

3. Domain-Native Tock Sequence

Starting from raw bytes and iteratively discovering the most informative event space:

BPC Improvement with Each Tock

Each tock discovers a new ES that reduces prediction error. A₀ (bytes only) = unigram at 5.0 bpc. A₄ (offset conjunctions) approaches 1.0 bpc. Compare: the sat-rnn at 2.81 bpc sits between A₂ and A₃.

Step	Architecture	New E_i	Discovery method	BPC
A₀	Bytes only	E₀ = {0..255}	Given	~5.0
A₁	+ bigram ES	E₀ × E₀	Adjacent byte MI	~3.5
A₂	+ word boundary	{0, 1}	Space char has highest MI	~3.0
A₃	+ tag state	{0, 1}	< > create high-MI partition	~2.5
A₄	+ offset conjunctions	2-offset product ESes	Factor map / backward trie	~1.0

The experimental program IS the tock sequence. Backward trie discovers MI-ranked offsets (Feb 7). Factor map discovers 2-offset conjunctions (Feb 9). ES discovery finds SVD event clusters (Feb 12). Weight construction builds maps from count tables (Feb 11). Each is a tock step, performed manually. The goal is to automate it.

4. The Factorization Tower

Event spaces form a tower from fine to coarse, connected by surjections (quotient maps). Each level discards within-class structure.

E₀ Bits |E| = 2

↑ π₁: 8 bits → 1 byte

E₁ Bytes (characters) |E| = 256

↑ π₂: byte → class

E₂ Character classes (vowel, consonant, digit, ...) |E| ≈ 10

↑ π₃: char seq → token

E₃ Subword tokens (BPE) |E| ≈ 50k

↑ π₄: tokens → word

E₄ Words |E| ≈ 100k

Product factorization

E = E_a × E_b

Both components independently accessible. I(E) = I(E_a) + I(E_b).

Example: h = (h₁, ..., h₁₂₈) decomposes into 128 binary ESes. Each bit is independently readable.

Preserves access. No backtracking needed.

Sum (sequential/quotient) factorization

E_fine ↠ E_coarse

Coarse event is a function of fine. Fine structure is inside the coarse event.

Example: "straw" is a token. The characters s,t,r,a,w are inside it, inaccessible at the token level.

Hides structure. Backtracking required to recover.

The tock step should prefer product factorizations when possible, because they preserve access to all components. Sum factorizations (like tokenization) hide structure and require costly backtracking.

5. The Strawberry Theorem

Interactive Demo: Counting Characters in Tokens

Word:

Count this char:

Token level (E₃):

Token sees:

π₃^-1 backtrack:

Count:

The Strawberry Theorem

P(model answers correctly) ≤ P(answer is retrievable from E_V)

This is not a failure of scale or training. No amount of data or compute can give a token-level model access to character-level structure. The fix is not a "better tokenizer" but a tower: maintain multiple levels simultaneously, with the ability to move between them as the task demands.

The strawberry problem generalizes to any task requiring finer resolution than the model's operating level:

Task	Requires	Token level has	Fix
Count chars in a word	E₁ (characters)	E₃ (tokens)	Backtrack π₃
Count syllables	Phoneme-level ES	E₃ (tokens)	Backtrack to phonemes
Syntactic role of "the"	Word-role ES	Positional only	Backtrack to parse
Does A∧B imply C?	Proposition-level ES	Token sequence	Backtrack to logic

6. Backtracking Through the Tower

Cost of Backtracking vs Benefit of Resolution

Backtracking from tokens to bytes increases sequence length ~4x but gives access to character-level structure. The domain-native UM pays this cost only when needed.

Backtracking recovers lost information: I(after backtrack) = I(E_k) + H(fiber | context). If context determines the fine event exactly, backtracking is free. If context gives no information, it costs the full within-class entropy.

Three strategies compared:

Strategy	Cost	Can solve fine tasks?
Always finest level (char-by-char)	High (long sequences)	Yes
Always coarsest level (token-by-token)	Low	No
Tower with backtracking	Low (usually) + high (when needed)	Yes

7. Architecture from Evidence vs. Belief

Neural Architecture Search

Search space

Hyperparameters

Objective

Validation loss

Method

Grid / random / Bayesian

Cost

Train many models

Result

Architecture (opaque)

Backtracking

Not applicable

Source

Belief (prior commitment)

Domain-Native Tock

Search space

Event spaces

Objective

MI with output

Method

Counting + quotients

Cost

Count once

Result

Architecture (interpretable)

Backtracking

Built in (tower)

Source

Evidence (data counts)

The three sources of support for architecture choices:
Evidence (the tock step): count co-occurrences, compute MI, discover E_i. All from data.
Belief (NAS): "use 128 hidden neurons" or "BPE with 50k vocab." Prior commitment, not domain-specific.
Abduction: seeing the backward trie pattern and recognizing "offset 7 captures word-initial context." Short-circuits induction by understanding why.

8. The Tick-Tock Closed Loop

INIT

A₀

bytes only

→

TOCK

E_k+1

discover ES
from MI

→

TICK

predict

forward pass
update counts

→

EVAL

gap?

if error high
→ tock again

No gradient descent anywhere. Tock uses MI analysis. Tick uses counting (ω₀). Eval uses the forward pass (f). All operations are O(N) in data size. The Feb 11 weight construction demonstrated this: all 82k parameters from data statistics, 1.89 bpc with zero SGD (vs 4.97 for trained).

9. Tock Steps in the Experimental Program

Every Discovery Tool Is a Tock Step

Method	What it discovers	New E_i	Archive
Backward trie	MI-ranked offsets	Offset conjunctions	Feb 7
Skip-k-gram analysis	Informative offset pairs	2-offset ESes	Feb 8
Factor map	Neuron → conjunction	Domain features	Feb 9
ES discovery (SVD)	Skip-bigram SVD	Event clusters	Feb 12
Weight construction	Shift-register groups	Hash-based ESes	Feb 11

Each method discovers a new event space from data statistics, then builds maps connecting it to existing architecture. This is the tock step, performed manually. The goal is automation.

The MI Criterion: Greedy Offset Selection

The backward trie ranks offsets by MI with the output. Greedy selection: d=1 first, then d=8, then d=2, then d=7... This matches the empirically discovered skip-pattern order exactly.

10. Why Current LLMs Can't Backtrack

Modern LLMs make an irrevocable architectural choice at tokenization. The factorization π: chars → tokens is fixed before training and cannot be undone at inference time. The model operates at the token level, period. Every task requiring character-level resolution must be handled by memorization or heuristics.

Token-level models lose character structure irrecoverably. The tower maintains all levels, paying the cost of backtracking only when needed.

The fix is not a better tokenizer. It's a tower: maintain multiple levels of the factorization simultaneously, with the ability to move between them as the task demands. Normally operate at whatever level is efficient (E₃ or E₄). When a task requires finer resolution, backtrack to the appropriate level.

The strongest reading of CMP's equivalence thesis

Interpretability and efficiency are the same problem because both are solved by the evidence-driven factorization. The domain-native tock makes this constructive.

Navigation

Full Paper (PDF)

The Tock Step: 10 sections, proofs, strawberry theorem

Carrier Signal Viewer

Shared-offset problem, KN baselines, event spaces

CMP Review Viewer

The five-tuple, scorecard, three extensions

← Archive Index

All papers and tools for 2026-02-12

Claude and MJC · February 12, 2026 · The Tock Step: Domain-Native Architecture from Evidence