Berkeley CS 294
Multi-Level Simulation for Rapid Microarchitectural Iteration and Evaluation
Vighnesh Iyer, Raghav Gupta
CS 294 Project Proposal
Motivation
- I want to evaluate the impact of a microarchitectural feature / optimization / parameter (RTL-level)
- Simple parameter: Modifying number ROB/LSU entries
- Complex parameter: Changing the cache hierarchy and sizing
- Block-level uArch: Pipelining the FPU more aggressively
- Cross-cutting uArch: Changing the NoC topology or parameterization
- On a long-running workload
- Cloud applications
- 100s of millions - billions of cycles
- Each application places pressures on different uArch elements
Microarchitecture Evaluation Strategies
|
Throughput |
Latency |
Fidelity |
ISA Simulation |
10-100+ MIPS |
<1 second |
None |
uArch Perf Sim |
100 KIPS (gem5) |
5-10 seconds |
5-10% avg IPC error |
RTL Simulation |
1-10 KIPS |
5-10 minutes |
cycle-exact |
FireSim (FPGA) |
1-50 MIPS |
2-6 hours |
cycle-exact |
Magic Box |
10 MIPS |
<1 minute |
<5% error, 10k intervals |
- How do we build the magic box?
- Leverage the strengths of ISA, uArch, and RTL simulators
Phase Behavior of Programs
- Program execution traces aren’t random
- They execute the same code again-and-again
- Application execution traces can be split into phases that exhibit similar uArch behavior
- Prior work: SimPoint
- Identify basic blocks executed in a given interval (e.g. 1M instruction intervals)
- Embed each interval using their ‘basic block vector’
- Cluster intervals using k-means
- Similar intervals → similar uArch behaviors
- Only execute unique intervals in low-level RTL simulation!
Multi-Level Simulation Flow
- Execute the application in ISA-level simulation
- Use SimPoint style interval clustering
- Capture arch checkpoints for each unique interval
- Capture memory / branch / PC traces for each interval
- Inject traces into uArch component simulator
- Separate model per component (cache, BP)
- Functional warmup
- Inject uArch state into RTL sim
- Detailed warmup
- Performance numbers
- Extrapolate up the stack