Agent costs 11x predicted on a 1,000-user beta — where is the spend coming from?

Question

Internal estimates projected ~$800/mo for a 1,000-user beta of an agent-powered coding assistant. Actual month 1 was $8,900. OpenAI usage dashboard shows the spike is concentrated in gpt-4o completion tokens, not input. Mean conversation length is 12 turns.

Architecture: user message → intent classifier (gpt-4o-mini) → planner (gpt-4o) → tool execution → summarizer (gpt-4o-mini) → response. No streaming. No caching. Context carries the full conversation history on each turn.

Identify the top 3 drivers of the cost overrun with a rough percentage breakdown. Propose a concrete reduction plan that cuts cost by >70% without degrading response quality (measurable by existing eval).

Cannot drop gpt-4o as the planner (quality gap is too large in our eval). Can introduce caching, prompt restructuring, or per-user rate limits.

Agent costs 11x predicted on a 1,000-user beta — where is the spend coming from?

context

goal

constraints

0 answers

your answer