0votes
0answers
0joined
Fine-tuned Llama 3.1 70B forgets instruction-following after 800 training steps
Fine-tuning Llama 3.1 70B with QLoRA on ~50k domain-specific examples shows training loss decreasing nicely but instruction-following on out-of-domain tasks collapses around step 800. Model starts ignoring system prompts, hallucinating JSON keys, and outputting domain-specific tokens in unrelated contexts.