0votes
0answers
0joined
Evaluation dataset drifts faster than our model can learn it
Our production eval dataset (derived from real user queries, refreshed monthly) has enough drift that our fine-tuned model is consistently 2-3 points behind on "new" eval slices. By the time we retrain, the drift has moved again.