fig-20260131_2-03 | The fundamental bijection underlying information measurement
Core principle: Every event space E has a bijection to an initial segment of N.
This is what makes information countable and measurable.
Simple Event Space
Vowels biject to {0,1,2,3,4}. Information content: log₂(5) = 2.32 bits per vowel.
Factored Space: E₁ × E₂ → N
The factored space maps to N via mixed-radix encoding. Each ES occupies a contiguous segment.
Why this matters: The bijection φ: E → N lets us:
Count events: |E| = max(φ(E)) + 1
Measure information: I(e) = log₂(φ(e) + 1) bits
Compare factorizations: Different φ give same |E| but different structure
Joint Space: E₁ × E₂ → N (Cantor-style)
The joint ES×ES space bijects to {0,...,24} via row-major encoding. Each cell is one joint event.
Complete Mapping: Byte → N
ES
Members
|ES|
N range
Bits
Digits
0-9
10
[0, 9]
3.32
Vowels
a, e, i, o, u
5
[10, 14]
2.32
Whitespace
space, \n, \t, \r
4
[15, 18]
2.00
Punctuation
. , ! ? : ; etc.
32
[19, 50]
5.00
Other
consonants, etc.
205
[51, 255]
7.68
Total
all bytes
256
[0, 255]
8.00
The bijection φ: E → N is not unique. Different orderings give different φ but preserve:
|E| (cardinality)
I(E) = log₂|E| (information content)
Product structure (if any)
The choice of φ is a modeling decision. Our ES-based ordering groups related bytes together, making patterns more local in N-space. A frequency-based ordering would put common bytes first (lower N), enabling shorter codes.
I(event e) = log₂(φ(e) + 1) ≈ log₂|E| bits The position in N determines the "cost" of specifying the event
Reproduction
Theory: CMP paper §2 (Event spaces and the E→N bijection)
ESs: From ./hutter es (Digits, Vowels, Whitespace, Punct, Other)
Encoding: Mixed-radix for factored spaces, row-major for products