عجفت الغور

i cache is all you need

drafts

  • BOLT, FDO, PGO, Propeller, tell you specifically it’s frontend stalls that matter
  • We really want to be careful of frontend stalls, since i-cache stalls prevent the CPU from doing useful information
    • Frontend is responible for fetching and decoding instructions
    • Responsible for executing instructions
  • Code layout matters for the icache
  • Two types of optimizations really matter
    • Inlining (frees up the icache)
    • Indirect call promotion
  • CPU needs to fetch instructions from memory, and if the required instruction is not in the i-cache, the frontend must wait for it to be retrieved (hundreds of cycles)
  • Instruction TLB
    • I-tlb miss occurs when you need to force a page walk? But how does this work
  • frontend work can’t be parallelized, if you’re waiting on d-cache info, you can do something else, but the frontend stalls means icache is stuck and can’t feed uOps
    • Backend can do out of order execution, by looking ahead and finding stuff that doesn’t require the missing data
      • How?
  • But how does L1 d-cache vs L1 i-cache work?
    • Switft is icache heavy because of indrection?
    • Harvard architecture
      • I-tlb vs D-tlb
      • How would you tell this in perf?
    • L2 TLB is unified