← Back to archive

Checkpoint Trajectory

99 checkpoints (10M–990M bytes): the Boolean automaton strengthens, the factor map collapses

Three Phases of Training

Phase I: Learning
10–110M
R² 0.83–0.86

Phase II: Stable
110–400M
R² 0.80–0.84

Phase III: Collapse
450–990M
R² 0.40–0.76

2.78→61.3

Mean margin
(monotonic growth)

450M

R² cliff
(coincides with forgetting)

640M

b_y all-negative
(phase transition)

0.031

W_h std growth
per 10M (linear)

Margin Trajectory: Always Boolean

The mean margin grows monotonically from 2.78 (10M) to 61.3 (990M). The model is Boolean from the very first checkpoint.

Mean Margin vs Training Progress

The anomalous 100M point (margin 59.7) is from Run 1 (SGD), not Run 2 (Adam). Excluding it, margin growth is smooth and monotonic. The grey band marks Phase II (stable). Data: 99 checkpoints from trajectory.csv.

There is no analog regime at any point during training. The tanh RNN is Boolean from the very first checkpoint (75.7% of margins > 1 at 10M). Adam with gradient clipping drives margins higher: weights grow unbounded, producing larger pre-activations.

R² Trajectory: The 450M Cliff

The 2-offset conjunction R² is stable at 0.80–0.86 for 400M bytes, then collapses sharply.

Factor Map R² vs Training Progress

The R² cliff at 450–460M coincides exactly with the onset of catastrophic forgetting. During Phase I–II (10–400M), R² oscillates in [0.80, 0.86]. After 450M, it drops to 0.40–0.76. Data: trajectory.csv.

Neurons with R² ≥ 0.80

Before 450M: 73–126 neurons above the threshold. After 460M: drops to 0–8 (with transient spikes). The conjunction structure fragments during catastrophic forgetting.

Catastrophic forgetting destroys the readout, not the dynamics. Margins grow through the collapse; the Boolean automaton strengthens. What fails is the W_y/b_y readout: the model can no longer decode its own internal representation.

W_h Drift: Perfectly Linear

W_h Standard Deviation vs Training Progress

std(W_h) = 0.22 + 0.031 × (checkpoint − 10M)/10M. R² > 0.99 (excluding 100M anomaly). The growth is perfectly linear and never stabilizes. This drives margin growth and eventual readout failure.

Output Bias Collapse: The 640M Phase Transition

b_y Range (min, max) vs Training Progress

b_y^max crosses zero between 630M (+0.66) and 640M (−0.37). After 640M, all 256 output biases are negative. At 990M: [−46.3, −29.9]. The reversal begins at 450M—coinciding with the R² cliff.

b_y^max reverses at exactly 450M. From 10–450M, b_y^max grows (0.79→8.18). From 450–640M, it collapses (8.18→−0.37). Three transitions align: R² cliff, b_y reversal, and training bpc cliff. All at 450M.

BPC and Hebbian Correlation

BPC on Eval Window

BPC on first 1024 bytes. Stays 5.3–7.5 throughout (model has moved on from this data).

Hebbian r(cov, W_h)

Hebbian correlation drifts down from ~0.42 to ~0.30, then rebounds slightly in Phase III.

The Anomalous 100M Checkpoint

epoch1_100M.bin is from Run 1 (SGD), not Run 2 (Adam).

The 100M checkpoint has margin 59.7, W_h std 3.30, R² = 0.49, b_y range [−47.4, −30.9]. These values are 7× higher than surrounding checkpoints (90M: margin 7.4, std 0.50; 110M: margin 8.2, std 0.55). The Run 1 model at 100M had already undergone the same weight explosion that Run 2 doesn't reach until 990M.

Combined Dashboard: All Metrics

Margin, R², and W_h std (normalized)

All three metrics normalized to [0,1] for comparison. Margin and W_h std grow together (positive feedback). R² holds steady then collapses at 450M. The 100M anomaly is excluded for clarity.

Scaling Results

Two kinds of Boolean automaton. Predictions scorecard.

The Evidence (1024B)

Complete evidence chain for the 1024-byte model.

Weight Construction

12 experiments: counting to construction.

Source: trajectory.tex · Data: trajectory.csv · Program: trajectory.c