Don't run the full workload in detailed simulation
Run the workload in ISA simulation and pick samples to run in uArch simulation
The full workload is represented by a selection of sampling units.
The state from a sampling unit checkpoint is only architectural state. The microarchitectural state of the uArch simulator starts at the reset state!
wikisort benchmark from embench, $N = 10000$, $C = 18$, $n_{\text{detailed}} = 2000$
huffbench benchmark from embench, $N = 10000$, $C = 18$, $n_{\text{detailed}} = 2000$
Accurate modeling of time is essential for datacenter workloads
Live sampling (interleaving arch and uArch sim) is required to accurately model time-dependent behaviors.
[1]: T. Grass, et. al. TaskPoint: Sampled simulation of task-based programs. 2016 ISPASS
[2]: T. E. Carlson, et. al., BarrierPoint: Sampled simulation of multi-threaded applications. 2014 ISPASS
[3]: A. Sabu, et. al., LoopPoint: Checkpoint-driven Sampled Simulation for Multi-threaded Applications. 2022 HPCA
[4]: E. Argollo, et. al. COTSon: infrastructure for full system simulation. SIGOPS 2009
[5]: T. F. Wenisch, et. al. SimFlex: Statistical Sampling of Computer System Simulation. IEEE Micro 2006
$ bin64/drrun -t drmemtrace -tool view -indir drmemtrace.*.dir -sim_refs 20
Output format:
record instr tid record details
------------------------------------------------------------
1 0: 3256418 marker: version 6
2 0: 3256418 marker: filetype 0x240
3 0: 3256418 marker: cache line size 64
4 0: 3256418 marker: chunk instruction count 1024
5 0: 3256418 marker: page size 4096
6 0: 3256418 marker: timestamp 13312410768080478
7 0: 3256418 marker: tid 3256418 on core 7
8 1: 3256418 ifetch 3 byte(s) @ 0x00007fc205a61940 48 89 e7 mov %rsp, %rdi
9 2: 3256418 ifetch 5 byte(s) @ 0x00007fc205a61943 e8 b8 0c 00 00 call $0x00007fc205a62600
10 2: 3256418 write 8 byte(s) @ 0x00007fff9a9e3528 by PC 0x00007fc205a61943
11 3: 3256418 ifetch 1 byte(s) @ 0x00007fc205a62600 55 push %rbp
12 3: 3256418 write 8 byte(s) @ 0x00007fff9a9e3520 by PC 0x00007fc205a62600
13 4: 3256418 ifetch 3 byte(s) @ 0x00007fc205a62601 48 89 e5 mov %rsp, %rbp
14 5: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62604 41 57 push %r15
15 5: 3256418 write 8 byte(s) @ 0x00007fff9a9e3518 by PC 0x00007fc205a62604
16 6: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62606 41 56 push %r14
17 6: 3256418 write 8 byte(s) @ 0x00007fff9a9e3510 by PC 0x00007fc205a62606
18 7: 3256418 ifetch 2 byte(s) @ 0x00007fc205a62608 41 55 push %r13
19 7: 3256418 write 8 byte(s) @ 0x00007fff9a9e3508 by PC 0x00007fc205a62608
20 8: 3256418 ifetch 2 byte(s) @ 0x00007fc205a6260a 41 54 push %r12
We wish to answer some questions
However, we can evaluate sampling techniques (assuming perfect detailed simulation)
Our proposal: Combine SimPoint-style representative sampling with SMARTS-style small intervals