mistralai/Mistral-7B-Instruct-v0.2 · ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0

itod

Mar 1, 2024

Hello,

I have an issue when running the model on Tesla P6 GPU with 16GB of RAM:

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla P6 GPU has compute capability 6.1. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.

Model is served with vLLM. I've tried the suggestion to use "--dtype=half" when calling the model, but it gave me another error, obviously solution is not that simple.

Any suggestions how can I approach solving this issue?

Regards.

PccNLP

Mar 2, 2024

•

edited Mar 2, 2024

I'm not a professional, but the solution I encountered for the same problem before is:

In huggingface transformer

device = "cuda"
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model.to(device).half()

Regarding vLLM, I think you can inquire on their GitHub.
https://github.com/vllm-project/vllm/issues

itod

Mar 20, 2024

Solved, just for the record if other face the same issue: Had to vonvert the model to GGUF, int8 or fp16 and then it worked.

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0

I'm not a professional, but the solution I encountered for the same problem before is:

device = "cuda"model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")model.to(device).half()

device = "cuda"
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model.to(device).half()