Cant load checkpoint shards

#18

by popmanpop - opened Jul 17

Discussion

popmanpop

Jul 17

Hi all, I wanted to try this model, copied the code from the site :

pip install transformers==4.41.1

from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import login

model_id = “CohereForAI/aya-23-8B”
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Format message with the command-r-plus chat template

messages = [{ “role”: “user”, ‘content’: “Generate a story about evil carabas barabas"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors=“pt”)

<|START_OF_TURN_TOKEN||><|USER_TOKEN|>Anneme onu ne kadar sevdiğimi anlatan bir mektup yaz<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN||>

gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)

gen_text = tokenizer.decode(gen_tokens[0])

print(gen_text)

At startup first download config files and so on, then Loading checkpoint shards starts, and it takes a very long time to load, at 25-50 percent the computer often hangs. Once I managed to download 100%, but then the download ended and nothing happened until I killed the terminal, just blinking cursor. Tried both with windows and ubuntu, everywhere about the same errors. Maybe i have a bug in my code? Or am I downloading the wrong thing?

NovaYear

Jul 18

Hi all, I wanted to try this model, copied the code from the site :

pip install transformers==4.41.1

from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import login

login(“hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”)

model_id = “CohereForAI/aya-23-8B”
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Format message with the command-r-plus chat template

messages = [{ “role”: “user”, ‘content’: “Generate a story about evil carabas barabas"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors=“pt”)

<|START_OF_TURN_TOKEN||><|USER_TOKEN|>Anneme onu ne kadar sevdiğimi anlatan bir mektup yaz<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN||>

gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)

gen_text = tokenizer.decode(gen_tokens[0])

print(gen_text)

At startup first download config files and so on, then Loading checkpoint shards starts, and it takes a very long time to load, at 25-50 percent the computer often hangs. Once I managed to download 100%, but then the download ended and nothing happened until I killed the terminal, just blinking cursor. Tried both with windows and ubuntu, everywhere about the same errors. Maybe i have a bug in my code? Or am I downloading the wrong thing?

According to the codes, it looks like you are trying to load the model into the processor (CPU) and memory. Do you have enough memory? Is your processor powerful? Apart from that, don't you have a new generation powerful graphics card with a lot of video memory?

popmanpop

Jul 18

According to the codes, it looks like you are trying to load the model into the processor (CPU) and memory. Do you have enough memory? Is your processor powerful? Apart from that, don't you have a new generation powerful graphics card with a lot of video memory?

I don't know if this is enough, but my CPU is ryzen 5 5500, GPU radeon rx 5600xt(6gb Vram), RAM 16gb 3200mhz.

I think it is also influenced by the fact that AMD video cards do not have many Nvidea technologies, since everywhere (for example, in the pipeline documentation) cuda is used.

I'll try to configure this code to use the GPU and let you know if it helps.

NovaYear

Jul 23

According to the codes, it looks like you are trying to load the model into the processor (CPU) and memory. Do you have enough memory? Is your processor powerful? Apart from that, don't you have a new generation powerful graphics card with a lot of video memory?

I don't know if this is enough, but my CPU is ryzen 5 5500, GPU radeon rx 5600xt(6gb Vram), RAM 16gb 3200mhz.

I think it is also influenced by the fact that AMD video cards do not have many Nvidea technologies, since everywhere (for example, in the pipeline documentation) cuda is used.

I'll try to configure this code to use the GPU and let you know if it helps.

Check out the page below for amd graphics cards, maybe it will help. Or sell the graphics card and buy a used affordable nvidia rtx series. I first tried it with an Intel A series graphics card, it was very tiring to make settings, I sold the card and bought an RTX4060ti 16GB. The memory was insufficient, so I sold it again, this time I bought the rtx3090 24gb.
https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/index.html

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment