RAM crash when loading shards

#5
by hythyt - opened

Hello, three days ago I was able to load the model on a Google Colab with a T4 GPU (free account). Now, when loading the shards, it overloads the RAM and doesn't manage to load it onto the GPU.
Thanks in advance.

This is the code that I-m using:

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

input_text = "El mercat del barri és fantàstic, hi pots trobar"

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model_id = "projecte-aina/aguila-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
print(device)
generator = pipeline(
"text-generation",
model=model_id,
tokenizer=tokenizer,
device_map=device,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
generation = generator(
input_text,
do_sample=True,
top_k=10,
max_new_tokens=10,
eos_token_id=tokenizer.eos_token_id,
)
print(f"Result: {generation[0]['generated_text']}")

Sign up or log in to comment