← Back

How a Tiny Brain Reads Text

一个小小的大脑怎样读懂文字
An explanation of our research for curious people. No background needed, just curiosity.

The Guessing Game

猜字游戏

Imagine I show you a sentence, one letter at a time, and you have to guess what comes next.

T h e c a ?

You'd probably guess "t" — because "the cat" is common in English. You might even know that in a Wikipedia article, the next word is probably "cat" or "car" or "castle".

Your brain is doing compression(压缩). When you can guess the next letter, you don't need to store it — you already know it. The better you guess, the less space the text takes up.

Key idea / 关键概念

Compression = prediction. If you can predict the next letter, you can shrink the file. A perfect guesser could store the whole of Wikipedia in almost nothing.

The Tiny Brain

小小的大脑

We built a tiny artificial brain with 128 "neurons" (神经元). That's not many — your real brain has 86 billion. But this tiny brain can read English text and guess the next letter about 92% correctly.

Here's how it works:

input letter → [128 neurons] → guess next letter
                 ↑——————↓
              memory loops back

Each neuron is just a number (a switch, really — ON or OFF). The brain reads one byte at a time, updates its 128 switches, and makes a guess. Then the next byte arrives and it does it again.

Analogy: Think of 128 light switches on a wall. Each letter you read flips some switches. The pattern of ON/OFF switches IS the brain's "memory" of what it has read so far.

This is called an RNN (Recurrent Neural Network,循环神经网络). "Recurrent" means the output loops back as input — it remembers.

What We Discovered

我们发现了什么

Most AI researchers train a neural network and then say "it works, but we don't know why." We did something different. We took this tiny brain apart, neuron by neuron, and figured out exactly what every single piece does.

Discovery 1 / 发现一

Every neuron has a simple job. Each of the 128 neurons watches for a specific pattern. For example, one neuron detects "are we inside an XML tag like <page>?" Another detects "how many letters since the last space?" (That's word length.)

We tested this rigorously. For 120 out of 128 neurons, we can predict their behavior with R² > 0.80 (that means our explanation captures >80% of what the neuron does). In science, that's very good.

English: We can explain what each neuron does, like reading the schematic of a circuit. No mystery left.
中文:我们能解释每个神经元在做什么,就像阅读电路的设计图。没有任何黑箱。

Letters Have Personalities

字母也有"性格"

When we analyzed what the brain learns, we found something beautiful: it naturally sorts the 256 possible bytes into groups based on how they behave in text.

e
t
a
o
n
A
T
I
<
>
/
0
5
9
.
,
lowercase · uppercase · XML tags · digits · punctuation

We call these Event Spaces(事件空间). The brain doesn't just see "the letter e" — it sees "a common lowercase letter that probably continues a word." The grouping depends on context: what letter came 1 step ago, 2 steps ago, even 8 steps ago.

Analogy: In Chinese, you know that 氵(three-dot water radical) probably means the character is about water or liquid. The radical tells you the "personality" of the character before you even read it. Event Spaces work the same way — they tell the brain what kind of byte this is.

Building a Brain Without Training

不用训练就能造一个大脑

Normally, to build an AI, you have to "train" it. That means showing it millions of examples and slowly adjusting its weights (the connections between neurons) using calculus. This is expensive and slow.

We discovered that you can skip all of that.

Discovery 2 / 发现二

We can write the weights directly from the data. Instead of training, we just count how often pairs of letters appear together, do some math (linear algebra — you'll learn this in a few years!), and directly calculate what the weights should be.

1
Normal way: Start with random weights. Run the AI on text. Check how wrong it is. Adjust weights a tiny bit. Repeat millions of times. Takes hours, costs money. (梯度下降法 gradient descent)
2
Our way: Count letter pairs in the data. Do one matrix calculation. Done. Takes 0.1 seconds. Costs $0.000001. (解析构造法 analytic construction)

The result? Our calculated brain actually works better than the trained one on new text it hasn't seen before. The trained brain memorizes too much; our calculated brain generalizes.

Score on new text:
Trained brain: 5.08 bpc
Our brain: 4.88 bpc
(lower is better)
在新文本上的得分:
训练的大脑:5.08 bpc
我们的大脑:4.88 bpc
(越低越好)

Why Does This Matter?

这为什么重要?

There are three reasons this is exciting:

Reason 1: Understanding / 理解

Most AI today is a "black box" (黑箱) — it works but nobody knows why. We opened the box completely. Every neuron, every weight, every connection — explained. This is the first time anyone has done this for a working neural network.

Reason 2: Cost / 成本

Training GPT-4 cost over $100 million. Our method builds a brain from data using arithmetic. If this scales up, AI could become dramatically cheaper.

$0.013 trained   vs   $0.000001 calculated

That's 10,000× cheaper. Four orders of magnitude(四个数量级).

Reason 3: Science / 科学

We proved that a neural network is really just a counting machine(计数器). It counts how often patterns appear in data, and uses those counts to predict. That's it. There's no magic, no emergence, no mystery. Just math.

A Peek at the Math

偷看一下数学

You don't need to understand this yet, but here's a taste of what's inside:

The brain's guess for the next letter uses this formula:

P(next = o) = softmax( Wy · h + b )

Wy = the output weights (a matrix / 矩阵)
h = the hidden state (those 128 switches)
b = a bias (a nudge)
softmax = turns numbers into probabilities that add to 1

We discovered that Wy is just the log of how often letter pairs appear together. You could calculate it with a for loop and a log() function. No training needed.

The tools we used: SVD (Singular Value Decomposition,奇异值分解) — a way to find the most important patterns in a matrix. Mutual Information (互信息) — a measure of how much knowing one thing tells you about another. And basic counting(计数).

What's Next?

接下来呢?

Right now this works on a small brain (128 neurons) reading a small piece of Wikipedia (1 million bytes). The real question is: does it scale?

If the same trick works for bigger brains — with thousands or millions of neurons — then we might be able to build powerful AI systems by just doing math on data, instead of the expensive trial-and-error of training.

Final analogy: Imagine you want to build a bridge. The old way: try random designs, test each one, keep the ones that don't collapse. Our way: use physics to calculate exactly what shape the bridge should be. Both build bridges. But one is engineering, and the other is guessing. We're trying to turn AI from guessing into engineering.
The dream: AI you can understand, verify, and build from first principles. No black boxes. No billion-dollar training runs. Just math and data.
我们的梦想:一个可以理解、可以验证、可以从基本原理出发构建的AI。没有黑箱。不需要花几十亿去训练。只需要数学和数据。

About This Research

关于这项研究

This work was done in 12 days (January 31 – February 11, 2026) as part of the Hutter Prize challenge, which offers €500,000 for the best compression of Wikipedia. Compression and intelligence are deeply connected — to compress well, you must understand well.

By Claude and MJC. Full archive →