6b-32k模型,通过api启动,推理时报错:RuntimeError: probability tensor contains either 'inf', 'nan', or element < 0

#43
by shichaoYan - opened

环境:centerOS7.9 ,两张nvida-A10显卡,python版本:3.11,pytorch版本:2.2.1 , CUDA Version: 12.2 Driver Version: 535.129.03
模型启动正常,单人去对话也正常,但是在同时有多人去使用或者连续请求的时候,就会有这个报错,但是模型并没有挂掉。
之前在微软云服务器上使用过同样版本的模型,也是多人使用,却没有问题。
切换到8k,和使用不同的启动方式,都有这样的问题,直接调用/v1/chat/completions没有问题,问题发生时都是在使用langchain的ChatOpenAI方法调用,流式输出时。
在网上找过很多解决方案,包括禁用do_sample等,均没有效果。求助!
Environment: CenterOS7.9, two NVida-A10 graphics cards, Python version: 3.11, PyTorch version: 2.2.1, CUDA version: 12.2 Driver version: 535.129.03
The model starts normally, and it's normal for a single person to have a conversation. However, when multiple people are using it or making continuous requests, this error message may occur, but the model did not crash.
I have previously used the same version of the model on Microsoft Cloud Server and it has been used by multiple people without any issues.
Switching to 8k and using different startup methods, there is no problem with directly calling/v1/chat/completion. When the problem is happening, it is called using the ChatOpenAI method of Langchain and outputted in a streaming manner.
I have found many solutions online, including disabling do_sample, but none of them have been effective. Help!

2B4A7064-B227-4293-BF1E-B44DD80BF197.png

看起來你是在使用FastChat時發生這個問題,我最近也在基於ChatGLM做開發,這個問題是因為temperature閥值過低導致,模型本身沒有問題。
目前FastChat開發進度緩慢,我在自己維護的版本中修復了一些ChatGLM會遇到的BUG,其中包含你遇到的 tensor inf 問題:
https://github.com/p208p2002/FastChat/commit/264a7ff22963f8161fda2027dc0118596f3f956c

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

将温度调整到 1e4以上,或者使用do_sample=False

zRzRzRzRzRzRzR changed discussion status to closed

Sign up or log in to comment