I encountered this problem when using Baichuan2-7B-Base with deepspeed stage3 for sft. A similar situation also happened in the place such as https://github.com/baichuan-inc/Baichuan2/issues/39#issuecomment-1710146497
I found that Baichuan2-13B-Chat has solved this problem, so I synced the code here

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment