Vighnesh Iyer, Borivoje Nikolić
Safin Singh, Ansh Maroo, Connor Chang, Pramath Krishna, Vighnesh Iyer, Joonho Whangbo
Enables high throughput and accurate simulation of long workloads
Don't run the full workload in RTL simulation
Use an instruction set simulator (ISS) and pick samples to run in RTL simulation
The full workload is represented by a selection of sampling units.
A sampling unit is defined by an architectural checkpoint.
The microarchitectural state of the RTL simulation starts at the reset state!
This process is called functional warmup
wikisort benchmark from embench, $N = 10000$, $C = 18$, $n_{\text{detailed}} = 2000$
Problem 1: Existing baremetal benchmarks (e.g. Embench, Coremark, etc.) are not interesting.
Problem 2: No systematic methodology for complete checkpointing and injection.
build.rs
for programmatic code generationalloc
and collections
)no_std
cratesHeavily used libraries in the wild + baremetal support
This is a purely experimental project
rv64imfd_Zicsr
(no privileged ISA)struct
#[derive(Serialize)]
pub struct Cpu {
pub regs: [u64; 32],
pub pc: u64,
pub csrs: Csrs
}
#[derive(Serialize)]
pub struct System {
pub cpus: Vec<Cpu>,
bus: Bus
}
Disentangling state and updates seems obvious, but is not easy with spike
target = "riscv64gc-unknown-none-elf"
: that's all it takes to pull in a cross compiler!no_std
dependency with a custom allocatorcargo build
like any other Rust project!riscv-tests/benchmarks
no_std
Cratesno_std
crates on crates.io that are very popular
The missing piece: stimulus
A path towards representative, high quality baremetal benchmarks
All these components tie into a robust sampled simulation flow
ADLs don't need a new language! It's hardware after all.
Our proposal: Combine SimPoint-style representative sampling with SMARTS-style small intervals