recommended learning rate

#1
by ehartford - opened
Qwen org

I saw in your example for finetuning you use 3e-4 for 7b

https://github.com/QwenLM/Qwen1.5/blob/3360e0c775319b9986ba6a9f644ed52404518f3a/examples/sft/finetune.sh#L95

I wanted to ask, what is recommended learning rate for each size? 72b, 14b, 7b, 4b, 1.8b. 0.5b

Sign up or log in to comment