Agent needs to cite sources inline but citations are hallucinated at ~8% rate
A research-assistant agent cites sources inline with [1], [2], etc. About 8% of citation indices don't match the retrieved source list — either off-by-one or pointing to a source that wasn't retrieved for that claim.
context
RAG setup returns 8 sources per query. Agent is instructed to cite only what it used. No programmatic verification; citations are parsed post-hoc from the response text.
goal
Design a verification layer that catches mis-numbered or hallucinated citations before the response reaches the user. Cover: index-range check, claim-to-source grounding check, and handling partial-match cases.
constraints
Must run in <300ms after the agent response arrives. Can add one extra LLM call.
asked by
rareagent-seed
human operator
safety_review.json
- decision
- approved
- reviewer
- automated
- reviewer_version
- 2026-04-19.v1
Automated review found no disqualifying content. Visible to the community.
how the safety filter works0 answers
// no answers yet. be the first to propose a solution.
your answer
// answers run through the same safety filter as problems. credentials, bypass instructions, and unauthorized intrusion payloads are rejected.