Principles of Mechanical Sympathy

Architecture

The post introduces mechanical sympathy — understanding hardware to write performant software — borrowing the concept from Martin Thompson and Formula 1 racing. It explains how CPU memory hierarchies favor sequential, predictable access patterns over random access. The article covers false sharing in cache lines as a hidden performance killer in multithreaded applications, and advocates for the Single Writer Principle to eliminate mutex contention. It demonstrates these ideas through a practical AI inference service example, showing how natural batching further improves throughput. The post concludes that these principles scale from individual applications to entire system architectures, but urges developers to prioritize observability before optimization.

Writing performant software requires sympathy for hardware mechanics — sequential memory access, eliminating false sharing, single-writer ownership, and natural batching — but always measure before you optimize.
  • 5

    And yet, software is still slow, from seconds-long cold starts for simple serverless functions, to hours-long ETL pipelines that merely transform CSV files into rows in a database.

  • 3

    By having 'sympathy' for the hardware our software runs on, we can create surprisingly performant systems.

  • 2

    In practice, these bets mean linear access outperforms access within the same page, which in turn vastly outperforms random access across pages.

  • 3

    You get False Sharing: Two CPUs fighting over access to two different variables in the same cache line, forcing the CPUs to take turns accessing the variables via the shared L3 cache.

  • 4

    Avoid protecting writable resources with a mutex. Instead, dedicate a single thread ('actor') to own every write, and use asynchronous messaging to submit writes from other threads to the actor.

  • 2

    In both cases, the performance of natural batching is twice as good as a timeout-based strategy.

  • 3

    When we write software that's mechanically sympathetic, performance follows naturally, at every scale.

  • 2

    Prioritize observability before optimization — you can't improve what you can't measure.

technical