A100 can process only 4k tokens

#27

by KubilayCan - opened Jul 7, 2024

Discussion

KubilayCan

Jul 7, 2024

•

edited Jul 12, 2024

I'm using a A100 80Gb GPU, transformers==4.42.3, torch==2.3.1 and bfloat16 precision.

The code is as in the template:

model = AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map="auto",
        torch_dtype=torch.bfloat16,
        attn_implementation='eager'
)

Whenever the prompt + max_new_tokens exceed 4k tokens, i get CUDA error. I have tried using two GPUs but still have same problem.

What else can I try?

I can generate an ouput with the same prompt in Google studio.

KubilayCan changed discussion title from large value of max_new_tokens crashes the model to A100 can process only 4k tokens Jul 12, 2024

Renu11

Google org Jul 15, 2024

Hi @KubilayCan , Could you please provide the stack trace error in details to better understand the issue. Thank you.

KubilayCan

Jul 16, 2024

Hi @Renu11 , the problem was related to the sliding windows. The update function in transformers was using a wrong variable. It is reported here: https://github.com/huggingface/transformers/issues/31781 and here: https://github.com/huggingface/transformers/issues/31848. I updated the function as suggested in the later one and it works now. I guess last transformers update solved this bug according to the last comment in that page.

KubilayCan changed discussion status to closed Jul 16, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment