RNN Memory Depth

How far back does the RNN use context? Measured by variance of predictions across different values at distance k.

Method: Group samples by byte at position -k. Compute mean prediction for each group. Measure variance of means across groups. Higher variance = prediction depends more on that position.

Measured memory depth: ~30 characters
(Distance where dependency drops to 1/e of maximum)

Predicted: d_max = 24 / H_avg ≈ 24 / 2 = 12 characters

Dependency vs Distance

Y-axis: Variance of prediction across different values at distance k. Higher = more dependency.

Distance k (characters back)

Data

Distance	Variance	Normalized
1	0.000610	0.763
2	0.000767	0.960
3	0.000713	0.892
4	0.000799	1.000
5	0.000667	0.836
6	0.000667	0.835
7	0.000537	0.673
8	0.000684	0.856
9	0.000546	0.683
10	0.000797	0.998
11	0.000443	0.555
12	0.000561	0.703
13	0.000596	0.747
14	0.000452	0.565

← Back to Archive