RuntimeError: cutlassF: no kernel found to launch!

by kehkok - opened

I am facing the issues as per the title, for executing "torch.bfloat16". Can suggest what's wrong in it?

Below is my dev env:

  • NVIDIA V100 GPU device
  • Python 3.10.12
  • CUDA 12.4 with Driver 550.54.14
  • accelerate==0.28.0
  • torch==2.1.2
  • transformers==4.38.2

Also, below code execute successfully , as my setup able to use the V100.

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.float16, token = access_token)

However, it failed for below code, as using "torch.bfloat16"

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.bfloat16, token = access_token)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
Google org

Hi @kehkok
Thanks for the issue! V100s do not support bf16, they only support fp16 for half-precision. I believe you need to restrict the usage for fp16 only in your case

