swe tea

computers

Weekly paper club/book club/video club
Need to figure out timings (tuesday nights?)
model off of ebpf reading groups and the distirbuted systems reading groups
- https://hackmd.io/@ebpf-reading-group/rkg0ou0I2

Profiling a Warehouse Scale Computer

Kanev, Svilen, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. “Profiling a Warehouse-Scale Computer.” In Proceedings of the 42nd Annual International Symposium on Computer Architecture, 158–69. Portland Oregon: ACM, 2015. https://doi.org/10.1145/2749469.2750392.
They only used C++, since it made it simpler
only on Ivy Bridge machines
No “killer application to optimize for, large chunks of compute are data locality bound and CPU stall bound, suggests that 2 wide SMT is not sufficient to eliminate the bulk of the overheads
- What is a 2 wide SMT anyways?
- I’m assuming it means 2 instructions at once, but not all instructions are parallelizable
- workload diversity is very real, we’ve gotten a range of compute that’s wide enough for this not to matter
  - At the start, 50 hottest binaries account for 80% of execution
  - Three years later, top 50 are only 60%
  - Coverage decreases more than 5% per over the course of 3 years
  - Also does not include public clouds
- Applications, as they grow more diverse and fatter, have gotten more flat profiles themselves
  - What would this look like for chatd?
“Data center tax” is very real, large chunks of your machine are going to be devoted to doing logging, rpc, ser/des
Yacine: top down measurement? never heard of this before
- Yasin, Ahmad. “A Top-Down Method for Performance Analysis and Counters Architecture.” In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 35–44, 2014. https://doi.org/10.1109/ISPASS.2014.6844459.
- Talks about core front-end and core back-end, what is that?
  - Front end:
    - instruction fetch
    - decode unit
    - branch prediction
    - uop cache
    - loop stream detector (?) - optimizes tight loops
  - Back end:
    - sched/reservation station
    - execution units
    - reorder buffer
    - register file
    - load/store units
- Top down classifies pipeline slots into retiring (useful work), frontend bounc, backend bound, and bad speculation
- They believe that cache problems (lots of lukewarm code) is why the frontend is the primary staller
  - i.e. binaries with 100s of mb
memcpy and memove() is 4-5% of datacenter cycles
- as is encryption
25% of datacenter tax is compressing and decompressing data

Dhalion

This paper was actually quite dull
Interesting bit is the split: metrics -> symptoms -> many to many -> diagnoses -> many to many -> resolvers
- This whole thing is called a “policy”
Control loop system, explodes in complexity
- Part of the control loop is blacklisting certain actions from occuring that previously didn’t move you towards your desired solution