A100 can process only 4k tokens
I'm using a A100 80Gb GPU, transformers==4.42.3, torch==2.3.1 and bfloat16 precision.
The code is as in the template:
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
attn_implementation='eager'
)
Whenever the prompt + max_new_tokens exceed 4k tokens, i get CUDA error. I have tried using two GPUs but still have same problem.
What else can I try?
I can generate an ouput with the same prompt in Google studio.
Hi @KubilayCan , Could you please provide the stack trace error in details to better understand the issue. Thank you.
Hi @Renu11 , the problem was related to the sliding windows. The update function in transformers was using a wrong variable. It is reported here: https://github.com/huggingface/transformers/issues/31781 and here: https://github.com/huggingface/transformers/issues/31848. I updated the function as suggested in the later one and it works now. I guess last transformers update solved this bug according to the last comment in that page.