Text Generation
Transformers
Safetensors
7 languages
stablelm
causal-lm
Inference Endpoints
12 papers

Big difference between the before-cooldown-ckpt and the final checkpoint in the results of downstream tasks?

#9
by siqi-zz - opened

We tested the checkpoint before cooldown and the final checkpoint, and found that there was a big difference in the results of downstream tasks. The final checkpoint significantly improved the results of downstream tasks. Are there any special strategies for the cooldown phase?
arc(25shot) hellaswag(10shot) mmlu(5-shot) truthfulqa winnogrande(5-shot) gsm(5-shot)
43.52 70.3 39.8 36.61 64.17 17.29
38.4 67.59 30 34.9 61.96 7.35

Sign up or log in to comment