E → N: Events as Natural Numbers

fig-20260131_2-03 | The fundamental bijection underlying information measurement

Core principle: Every event space E has a bijection to an initial segment of N. This is what makes information countable and measurable.

Simple Event Space

Vowels biject to {0,1,2,3,4}. Information content: log₂(5) = 2.32 bits per vowel.

Factored Space: E₁ × E₂ → N

The factored space maps to N via mixed-radix encoding. Each ES occupies a contiguous segment.

Why this matters: The bijection φ: E → N lets us:

Count events: |E| = max(φ(E)) + 1
Measure information: I(e) = log₂(φ(e) + 1) bits
Compare factorizations: Different φ give same |E| but different structure

Joint Space: E₁ × E₂ → N (Cantor-style)

The joint ES×ES space bijects to {0,...,24} via row-major encoding. Each cell is one joint event.

Complete Mapping: Byte → N

ES	Members	\|ES\|	N range	Bits
Digits	0-9	10	[0, 9]	3.32
Vowels	a, e, i, o, u	5	[10, 14]	2.32
Whitespace	space, \n, \t, \r	4	[15, 18]	2.00
Punctuation	. , ! ? : ; etc.	32	[19, 50]	5.00
Other	consonants, etc.	205	[51, 255]	7.68
Total	all bytes	256	[0, 255]	8.00

The bijection φ: E → N is not unique. Different orderings give different φ but preserve:

|E| (cardinality)
I(E) = log₂|E| (information content)
Product structure (if any)

The choice of φ is a modeling decision. Our ES-based ordering groups related bytes together, making patterns more local in N-space. A frequency-based ordering would put common bytes first (lower N), enabling shorter codes.

I(event e) = log₂(φ(e) + 1) ≈ log₂|E| bits
The position in N determines the "cost" of specifying the event

Reproduction

Theory: CMP paper §2 (Event spaces and the E→N bijection)

ESs: From ./hutter es (Digits, Vowels, Whitespace, Punct, Other)

Encoding: Mixed-radix for factored spaces, row-major for products