ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0

#58
by itod - opened

Hello,

I have an issue when running the model on Tesla P6 GPU with 16GB of RAM:

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla P6 GPU has compute capability 6.1. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.

Bfloat16.png

Model is served with vLLM. I've tried the suggestion to use "--dtype=half" when calling the model, but it gave me another error, obviously solution is not that simple.

Any suggestions how can I approach solving this issue?

Regards.

I'm not a professional, but the solution I encountered for the same problem before is:

In huggingface transformer

device = "cuda"
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model.to(device).half()

Regarding vLLM, I think you can inquire on their GitHub.
https://github.com/vllm-project/vllm/issues

Solved, just for the record if other face the same issue: Had to vonvert the model to GGUF, int8 or fp16 and then it worked.

Sign up or log in to comment