عجفت الغور

i cache is all you need

drafts

  • BOLT, FDO, PGO, Propeller, tell you specifically it’s frontend stalls that matter
  • We really want to be careful of frontend stalls, since i-cache stalls prevent the CPU from doing useful information
    • Frontend is responible for fetching and decoding instructions
    • Responsible for executing instructions
  • Code layout matters for the icache
  • Two types of optimizations really matter
    • Inlining (frees up the icache)
    • Indirect call promotion
  • CPU needs to fetch instructions from memory, and if the required instruction is not in the i-cache, the frontend must wait for it to be retrieved (hundreds of cycles)
  • Instruction TLB
    • I-tlb miss occurs when you need to force a page walk? But how does this work
  • frontend work can’t be parallelized, if you’re waiting on d-cache info, you can do something else, but the frontend stalls means icache is stuck and can’t feed uOps
    • Backend can do out of order execution, by looking ahead and finding stuff that doesn’t require the missing data
      • How?
  • But how does L1 d-cache vs L1 i-cache work?
    • Switft is icache heavy because of indrection?
    • Harvard architecture
      • I-tlb vs D-tlb
      • How would you tell this in perf?
    • L2 TLB is unified
  • BOLT’ing and propeller and lightning bolt
  • Zero cost inling is not actually zero cost (see memcmp vs memcpy)

Outline

  1. i-cache as bottleneck
  2. why frontend stalls are uniquebad
  3. FDO was great but annoying workflow
  4. LBR with perf
    1. You can slice every part of the LBR buffalo, use the in and out BB for CFR, cycles for how long you’re spending there
  5. Decompilation fuzziness
    1. Inling destroys source CFG
    2. We need to reconstruct a bunch of stuff
    3. Bolt internally uses MC
      1. AutoFDO goes address -> DWARF -> source -> recomp
      2. BOLT goes address -> MC disasembly -> binary CFG -> rewrite
    4. Question about over optimization? Can we merge optimizations?
  6. Apply profile where it was collected with BOLT
  7. Linker?
  8. Profile data as universal currency?