Recommanded hyperparameters?

#27
by zhilinw6 - opened

Any recommendations or insights on effective SFT hyperparameter settings, like lr, batch size, epochs, weight decay ...
Any advices on processing training data?

Alibaba-NLP org

You can refer to the training parameter settings introduced in the MGTE paper. The MGTE primarily focuses on encoder-only training, while the GTE-QWEN series models use LoRA for training. Apart from this factor, the other training hyperparameters and data strategies are similar.

https://arxiv.org/abs/2407.19669

Sign up or log in to comment