4.8

Harness engineering for coding agent users

ArchitectureTestingAgile & XPAI & LLMsTechnical Leadership

Martin Fowler introduces 'harness engineering' as the practice of building feedforward guides and feedback sensors around coding agents to increase confidence in their output. He distinguishes between computational controls (deterministic tools like linters and tests) and inferential controls (LLM-based semantic judgment), arguing both are necessary. The harness regulates three dimensions: maintainability, architecture fitness, and functional behaviour, with behaviour being the hardest unsolved problem. Fowler emphasizes that harnesses attempt to externalize the implicit knowledge human developers bring, but cannot fully replace human judgment. He frames this as an emerging engineering discipline, not a one-time configuration, where humans steer agents by iterating on the harness itself.

Coding agent harnesses must combine anticipatory guides with self-correcting feedback sensors across maintainability, architecture, and behaviour dimensions, but the hardest problem — ensuring functional correctness — remains unsolved and still requires human judgment.
  • 3

    A well-built outer harness serves two goals: it increases the probability that the agent gets it right in the first place, and it provides a feedback loop that self-corrects as many issues as possible before they even reach human eyes.

  • 5

    Separately, you get either an agent that keeps repeating the same mistakes (feedback-only) or an agent that encodes rules but never finds out whether they worked (feed-forward-only).

  • 7

    A coding agent has none of this: no social accountability, no aesthetic disgust at a 300-line function, no intuition that 'we don't do it that way here,' and no organisational memory.

  • 6

    Legacy teams, especially with applications that have accrued a lot of technical debt, face the harder problem: the harness is most needed where it is hardest to build.

  • 5

    Neither catches reliably some of the higher-impact problems: Misdiagnosis of issues, overengineering and unnecessary features, misunderstood instructions.

  • 4

    A good harness should not necessarily aim to fully eliminate human input, but to direct it to where our input is most important.

  • 3

    Building this outer harness is emerging as an ongoing engineering practice, not a one-time configuration.

  • 5

    Teams may start picking tech stacks and structures partly based on what harnesses are already available for them.

analytical