Berkeley CS 294

Multi-Level Simulation for Rapid Microarchitectural Iteration and Evaluation

Vighnesh Iyer, Raghav Gupta

CS 294 Project Proposal

Motivation

  • I want to evaluate the impact of a microarchitectural feature / optimization / parameter (RTL-level)
    • Simple parameter: Modifying number ROB/LSU entries
    • Complex parameter: Changing the cache hierarchy and sizing
    • Block-level uArch: Pipelining the FPU more aggressively
    • Cross-cutting uArch: Changing the NoC topology or parameterization
  • On a long-running workload
    • Cloud applications
    • 100s of millions - billions of cycles
    • Each application places pressures on different uArch elements

Microarchitecture Evaluation Strategies

Throughput Latency Fidelity
ISA Simulation 10-100+ MIPS <1 second None
uArch Perf Sim 100 KIPS (gem5) 5-10 seconds 5-10% avg IPC error
RTL Simulation 1-10 KIPS 5-10 minutes cycle-exact
FireSim (FPGA) 1-50 MIPS 2-6 hours cycle-exact
Magic Box 10 MIPS <1 minute <5% error, 10k intervals
  • How do we build the magic box?
  • Leverage the strengths of ISA, uArch, and RTL simulators
    • Multi-level simulation

Phase Behavior of Programs

  • Program execution traces aren’t random
    • They execute the same code again-and-again
    • Application execution traces can be split into phases that exhibit similar uArch behavior
  • Prior work: SimPoint
    • Identify basic blocks executed in a given interval (e.g. 1M instruction intervals)
    • Embed each interval using their ‘basic block vector’
    • Cluster intervals using k-means
  • Similar intervals → similar uArch behaviors
    • Only execute unique intervals in low-level RTL simulation!

Multi-Level Simulation Flow

  • Execute the application in ISA-level simulation
    • Use SimPoint style interval clustering
    • Capture arch checkpoints for each unique interval
    • Capture memory / branch / PC traces for each interval
  • Inject traces into uArch component simulator
    • Separate model per component (cache, BP)
    • Functional warmup
  • Inject uArch state into RTL sim
    • Detailed warmup
    • Performance numbers
  • Extrapolate up the stack