Instead of running the entire program in uArch simulation, run the entire program in functional simulation and only run samples in uArch simulation
The full workload is represented by a selection of sampling units.
The state from a sampling unit checkpoint is only architectural state. The microarchitectural state of the uArch simulator starts at the reset state!
L1d functional warmup brings IPC error from 7% to 2%
For a given workload interval and a interval length $N$ (e.g. $N = 10000$) and without functional warmup, we can compute this table. (each cell is IPC error wrt the full RTL simulation)
Detailed warmup instructions ($ n_{\text{warmup}} $) | |||||||
---|---|---|---|---|---|---|---|
0 | 100 | 500 | 1000 | 2000 | 5000 | ||
Detailed warmup offset ($ n_{\text{offset}} $) | 0 | Worst case | Offset error ↑ Warmup error ↓ |
Offset error 2↑ Warmup error 2↓ |
Offset error 3↑ Warmup error 3↓ |
Offset error 4↑ Warmup error 4↓ |
Maximum offset error |
-100 | Invalid | No offset error | '' | '' | '' | '' | |
-500 | No offset error | '' | '' | '' | |||
-1000 | No offset error | '' | '' | ||||
-2000 | No offset error | '' | |||||
-5000 | No offset error, best case |
Given the data in the table for every interval and for different interval lengths $N$, fit the following model: