6b-32k模型,通过api启动,推理时报错:RuntimeError: probability tensor contains either 'inf', 'nan', or element < 0

by shichaoYan - opened

环境:centerOS7.9 ,两张nvida-A10显卡,python版本:3.11,pytorch版本:2.2.1 , CUDA Version: 12.2 Driver Version: 535.129.03
Environment: CenterOS7.9, two NVida-A10 graphics cards, Python version: 3.11, PyTorch version: 2.2.1, CUDA version: 12.2 Driver version: 535.129.03
The model starts normally, and it's normal for a single person to have a conversation. However, when multiple people are using it or making continuous requests, this error message may occur, but the model did not crash.
I have previously used the same version of the model on Microsoft Cloud Server and it has been used by multiple people without any issues.
Switching to 8k and using different startup methods, there is no problem with directly calling/v1/chat/completion. When the problem is happening, it is called using the ChatOpenAI method of Langchain and outputted in a streaming manner.
I have found many solutions online, including disabling do_sample, but none of them have been effective. Help!


目前FastChat開發進度緩慢,我在自己維護的版本中修復了一些ChatGLM會遇到的BUG,其中包含你遇到的 tensor inf 問題:

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

将温度调整到 1e4以上,或者使用do_sample=False

zRzRzRzRzRzRzR changed discussion status to closed

Sign up or log in to comment