No output generated with sample code on non-quantised model

#7
by Pwicke - opened

Hi and thanks for this brilliant model.

I have been running your Colab notebook and it works like a charm on Google Colab. I have also tried to reproduce it on my server with 8x NVIDIA RTX A6000. With the exact same code from the notebook I receive the exact same output:

Question: What's on the picture? Answer: Kittens.

But whatever I do, if I do not use the quantised model but idefics-9b or idefics-9b-instruct, I only ever receive:

Question: What's on the picture? Answer:

The only difference between the colab code and my code is the removal of quantization_config=bnb_config from the IdeficsForVisionText2Text.from_pretrained(...) parameter list. I have a had a colleague find their own way of running the model with the code you provided and they have reproduced the exact same issue independently (Question: What's on the picture? Answer:). I've tried different GPUs and different servers, but without the quantised model, I am unable to produce any output. The model loads into memory and is accessed during inference - it just does not generate or return or display any new tokens (I have also increased max_new_tokens=50, tried other prompts like the Pokémon example).

Any help would be appreciated.

HuggingFaceM4 org

Hi @Pwicke ,
That does not sound right indeed.
Could you say more about your env? In particular transformers and tokenizers versions?
I'll try to reproduce the error.

Thank you for your response.

accelerate 0.24.0.dev0, bitsandbytes 0.41.1, nvidia-cublas-cu12 12.1.3.1, python 3.10.12 , sentencepiece 0.1.99 ,tokenizers 0.14.1, torch 2.1.0, transformers 4.35.0.dev0

Could I ask for an update on this? @VictorSanh

@Pwicke Have you solved this?

@TITH unfortunately not. I have to use the 4-bit quantised version. I recently tried the full model again, but still no new tokens are being generated. Do you have the same issue?

@Pwicke Yes. But I noticed that using cpu instead of cuda can solve it. Then I switched to torch 2.0.1 and cuda works as well.

Thanks for the response @TITH . I've tried cpu and it works. But since I also switched to torch 2.0.1, it does no longer use my gpu even though it's specified to do so. Now, I am running my experiment on cpu, which is suboptimal.

upgrading transformers to 4.37 can solve this problem.

Sign up or log in to comment