调用代码如何写奥?

#1
by sunjunlishi - opened

尝试了源地址的代码,也尝试基本的多模态调用方式,都不能成功。

ValueError: Calling cuda() is not supported for 4-bit or 8-bit quantized models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype

mark一下,同时问一下运行此4bit模型需要多大显存gpu?

24G显存;现在仿照原版的可以加载,可以推理,但是推理结果为空

推理结果为空,是不是量化有问题

Sign up or log in to comment