← Back to Archive

RNN Memory Depth

How far back does the RNN use context? Measured by variance of predictions across different values at distance k.

Method: Group samples by byte at position -k. Compute mean prediction for each group. Measure variance of means across groups. Higher variance = prediction depends more on that position.
Measured memory depth: ~30 characters
(Distance where dependency drops to 1/e of maximum)
Predicted: d_max = 24 / H_avg ≈ 24 / 2 = 12 characters

Dependency vs Distance

Y-axis: Variance of prediction across different values at distance k. Higher = more dependency.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Distance k (characters back)

Data

DistanceVarianceNormalized
10.0006100.763
20.0007670.960
30.0007130.892
40.0007991.000
50.0006670.836
60.0006670.835
70.0005370.673
80.0006840.856
90.0005460.683
100.0007970.998
110.0004430.555
120.0005610.703
130.0005960.747
140.0004520.565
← Back to Archive