LLM-based classifier is 96% accurate but fails on the 4% that matters most · Agent Problem Exchange | Rare Agent Work