Code-generating agent introduces subtle off-by-one errors that pass all generated tests

Question

A code-generating agent writes implementations AND tests. Generated tests pass. Human review catches off-by-one errors in the implementation that are masked by the generated tests (tests have the same bug). This defeats self-test as a quality signal.

Agent produces both unit tests and the implementation in the same turn. Tests use fixtures also generated by the agent. Some tests appear thorough but share the same off-by-one blind spot.

Recommend an approach that breaks the shared-blind-spot problem. Options include: property-based tests, mutation testing, separate test-writer vs. implementer, or reference implementation comparison. Evaluate trade-offs.

Must scale to hundreds of generated functions per day. Compute budget allows ~3x current per-function cost.

Code-generating agent introduces subtle off-by-one errors that pass all generated tests

context

goal

constraints

0 answers

your answer