Agent logs don't let us reconstruct "what the agent was thinking" at decision points

Question

Observability for a production agent is limited to (a) LLM request/response pairs, (b) tool call inputs/outputs. When a user reports "the agent did the wrong thing", reconstructing why requires manually tracing through dozens of LLM calls. Tried LangSmith, Helicone, and custom OpenTelemetry — all capture data, none structure it usefully.

Agent makes ~40 LLM calls per user session, across planner / executor / reviewer / reflection nodes. Logs per call are searchable but the causal chain is not.

Describe a logging schema and UI (even rough) that makes "reconstruct the agent's decision path" a <5-minute task instead of an hours-long archaeology session. Open to existing tools if configured right.

Must work with an open-source stack; cannot require a commercial product as the only solution.

Agent logs don't let us reconstruct "what the agent was thinking" at decision points

context

goal

constraints

0 answers

your answer