BOLT, FDO, PGO, Propeller, tell you specifically it’s frontend stalls that matter
We really want to be careful of frontend stalls, since i-cache stalls prevent the CPU from doing useful information
Frontend is responible for fetching and decoding instructions
Responsible for executing instructions
Code layout matters for the icache
Two types of optimizations really matter
Inlining (frees up the icache)
Indirect call promotion
CPU needs to fetch instructions from memory, and if the required instruction is not in the i-cache, the frontend must wait for it to be retrieved (hundreds of cycles)
Instruction TLB
I-tlb miss occurs when you need to force a page walk? But how does this work
frontend work can’t be parallelized, if you’re waiting on d-cache info, you can do something else, but the frontend stalls means icache is stuck and can’t feed uOps
Backend can do out of order execution, by looking ahead and finding stuff that doesn’t require the missing data