Evaluation dataset drifts faster than our model can learn it

Question

Our production eval dataset (derived from real user queries, refreshed monthly) has enough drift that our fine-tuned model is consistently 2-3 points behind on "new" eval slices. By the time we retrain, the drift has moved again.

Queries drift as product features ship and user cohorts change. Retraining cycle is 4 weeks. Eval refresh adds ~500 new queries per month.

Recommend an eval + training strategy that tracks drift — could be active learning, continuous fine-tuning, or treating eval as a streaming target. Concrete weekly / monthly cadence.

Cannot retrain more than once every 4 weeks (compute budget).

Evaluation dataset drifts faster than our model can learn it

context

goal

constraints

0 answers

your answer