Examples | Throughput | Latency | Accuracy | Cost | |
---|---|---|---|---|---|
JIT-based Simulators / VMs | qemu, KVM, VMWare Fusion | 1-3 GIPS | <1 second | None | Minimal |
Architectural Simulators | spike, dromajo | 10-100+ MIPS | <1 second | None | Minimal |
General-purpose μArch Simulators | gem5, Sniper, ZSim, SST | 100 KIPS (gem5) - 100 MIPS (Sniper) | <1 minute | 10-50% IPC error | Minimal |
Bespoke μArch Simulators | Industry performance models | ≈ 0.1-1 MIPS | <1 minute | Close | $1M+ |
RTL Simulators | Verilator, VCS, Xcelium | 1-10 KIPS | 2-10 minutes | Cycle-exact | Minimal |
FPGA-Based Emulators | Firesim | ≈ 10 MIPS | 2-6 hours | Cycle-exact | $10k+ |
ASIC-Based Emulators | Palladium, Veloce | ≈ 0.5-10 MIPS | <1 hour | Cycle-exact | $10M+ |
Trends aren't enough[2]. Note the sensitivity differences - gradients are critical!
uArch simulators are not accurate enough for microarchitectural evaluation.
We want a tool to evaluate microarchitectural changes on real workloads at high fidelity
We want a tool to evaluate microarchitectural changes on real workloads at high fidelity
A critical tool that solves a long-standing problem to enable the "design-first" methodology
Instead of running the entire program in uArch simulation, run the entire program in functional simulation and only run samples in uArch simulation
The full workload is represented by a selection of sampling units.
The state from a sampling unit checkpoint is only architectural state. The microarchitectural state of the uArch simulator starts at the reset state!
Wikisort benchmark from embench, $N = 10000$, $C = 18$, $n_{\text{detailed}} = 2000$
L1d functional warmup brings IPC error from 7% to 2%
Huffbench benchmark from embench, $N = 10000$, $C = 18$, $n_{\text{detailed}} = 2000$
For a given workload interval and a interval length $N$ (e.g. $N = 10000$) and without functional warmup, we can compute this table. (each cell is IPC error wrt the full RTL simulation)
Detailed warmup instructions ($ n_{\text{warmup}} $) | |||||||
---|---|---|---|---|---|---|---|
0 | 100 | 500 | 1000 | 2000 | 5000 | ||
Detailed warmup offset ($ n_{\text{offset}} $) | 0 | Worst case | Offset error ↑ Warmup error ↓ |
Offset error 2↑ Warmup error 2↓ |
Offset error 3↑ Warmup error 3↓ |
Offset error 4↑ Warmup error 4↓ |
Maximum offset error |
-100 | Invalid | No offset error | '' | '' | '' | '' | |
-500 | No offset error | '' | '' | '' | |||
-1000 | No offset error | '' | '' | ||||
-2000 | No offset error | '' | |||||
-5000 | No offset error, best case |
Given the data in the table for every interval and for different interval lengths $N$, fit the following model: