Text2Text Generation
Transformers
Safetensors
101 languages
t5
Inference Endpoints
text-generation-inference

Loading checkpoint shards: 3/11 fails?

#7
by zsxkib - opened

The sample code consistently fails, not sure why:

tokenizer_config.json:   0%|          | 0.00/833 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████| 833/833 [00:00<00:00, 2.52MB/s]
tokenizer.json:   0%|          | 0.00/16.3M [00:00<?, ?B/s]
tokenizer.json: 100%|██████████| 16.3M/16.3M [00:00<00:00, 213MB/s]
config.json:   0%|          | 0.00/836 [00:00<?, ?B/s]
config.json: 100%|██████████| 836/836 [00:00<00:00, 3.90MB/s]
model.safetensors.index.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]
model.safetensors.index.json: 100%|██████████| 50.6k/50.6k [00:00<00:00, 112MB/s]
Downloading shards:   0%|          | 0/11 [00:00<?, ?it/s]
model-00001-of-00011.safetensors: 100%|██████████| 4.94G/4.94G [00:20<00:00, 237MB/s]
Downloading shards:   9%|▉         | 1/11 [00:20<03:28, 20.87s/it]
model-00002-of-00011.safetensors: 100%|██████████| 5.00G/5.00G [00:21<00:00, 229MB/s]
Downloading shards:  18%|█▊        | 2/11 [00:42<03:13, 21.47s/it]
model-00003-of-00011.safetensors: 100%|██████████| 4.97G/4.97G [00:19<00:00, 252MB/s]
Downloading shards:  27%|██▋       | 3/11 [01:02<02:45, 20.70s/it]
model-00004-of-00011.safetensors: 100%|██████████| 4.90G/4.90G [00:22<00:00, 220MB/s]
Downloading shards:  36%|███▋      | 4/11 [01:24<02:29, 21.36s/it]
model-00005-of-00011.safetensors: 100%|██████████| 4.97G/4.97G [00:22<00:00, 223MB/s]
Downloading shards:  45%|████▌     | 5/11 [01:47<02:10, 21.70s/it]
model-00006-of-00011.safetensors: 100%|██████████| 4.97G/4.97G [00:20<00:00, 238MB/s]
Downloading shards:  55%|█████▍    | 6/11 [02:08<01:47, 21.46s/it]
model-00007-of-00011.safetensors: 100%|██████████| 4.87G/4.87G [00:20<00:00, 235MB/s]
Downloading shards:  64%|██████▎   | 7/11 [02:29<01:24, 21.24s/it]
model-00008-of-00011.safetensors: 100%|██████████| 5.00G/5.00G [00:21<00:00, 229MB/s]
Downloading shards:  73%|███████▎  | 8/11 [02:51<01:04, 21.50s/it]
model-00009-of-00011.safetensors: 100%|██████████| 5.00G/5.00G [00:21<00:00, 236MB/s]
Downloading shards:  82%|████████▏ | 9/11 [03:12<00:42, 21.48s/it]
model-00010-of-00011.safetensors: 100%|██████████| 2.99G/2.99G [00:09<00:00, 329MB/s]
Downloading shards:  91%|█████████ | 10/11 [03:21<00:17, 17.66s/it]
model-00011-of-00011.safetensors: 100%|██████████| 4.10G/4.10G [00:12<00:00, 325MB/s]
Downloading shards: 100%|██████████| 11/11 [03:34<00:00, 16.13s/it]
Downloading shards: 100%|██████████| 11/11 [03:34<00:00, 19.48s/it]
Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]
Loading checkpoint shards:   9%|▉         | 1/11 [00:03<00:31,  3.14s/it]
Loading checkpoint shards:  18%|█▊        | 2/11 [00:07<00:34,  3.80s/it]
Loading checkpoint shards:  27%|██▋       | 3/11 [00:11<00:31,  3.97s/it]

Extra details:

# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
  # set to true if your model requires a GPU
  gpu: true
  cuda: "11.8" # NOTE I have tried 12.1 too, same issue shard 3/11 seems broken?

  # a list of ubuntu apt packages to install
  system_packages:
    - "libgl1-mesa-glx"
    - "libglib2.0-0"

  # python version in the form '3.11' or '3.11.4'
  python_version: "3.11"

  # a list of packages in the format <package-name>==<version>
  python_packages:
    - "torch"
    - "torchvision"
    - "torchaudio"
    - "transformers"

  # commands run after the environment is setup
  run:
    - pip install transformers
    - curl -o /usr/local/bin/pget -L "https://github.com/replicate/pget/releases/download/v0.6.1/pget_linux_x86_64" && chmod +x /usr/local/bin/pget

# predict.py defines how predictions are run on your model
predict: "predict.py:Predictor"

Any suggestions?

Some further tests:

import os

os.system("pip install -q transformers")
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

print("Starting the script...")

checkpoint = "CohereForAI/aya-101"
print(f"Using checkpoint: {checkpoint}")

print("Initializing tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
print("Tokenizer initialized successfully.")

print("Initializing model...")
aya_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
print("Model initialized successfully.")

# Turkish to English translation
print("Preparing Turkish to English translation...")
tur_inputs = tokenizer.encode(
    "Translate to English: Aya cok dilli bir dil modelidir.", return_tensors="pt"
)
print("Turkish input encoded.")

print("Generating Turkish to English translation...")
tur_outputs = aya_model.generate(tur_inputs, max_new_tokens=128)
print("Translation generated.")

print("Decoding output...")
print(tokenizer.decode(tur_outputs[0]))
print("Turkish to English translation completed.")
# Aya is a multi-lingual language model

# Q: Why are there so many languages in India?
print("Preparing question about languages in India...")
hin_inputs = tokenizer.encode("भारत में इतनी सारी भाषाएँ क्यों हैं?", return_tensors="pt")
print("Indian languages question encoded.")

print("Generating answer for the question about languages in India...")
hin_outputs = aya_model.generate(hin_inputs, max_new_tokens=128)
print("Answer generated.")

print("Decoding output for the question about languages in India...")
print(tokenizer.decode(hin_outputs[0]))
print("Question about languages in India processed.")
# Expected output: भारत में कई भाषाएँ हैं और विभिन्न भाषाओं के बोली जाने वाले लोग हैं। यह विभिन्नता भाषाई विविधता और सांस्कृतिक विविधता का परिणाम है। Translates to "India has many languages and people speaking different languages. This diversity is the result of linguistic diversity and cultural diversity."

This code (the sample code you gave + prints) seems to fail at tokenizer = AutoTokenizer.from_pretrained(checkpoint):

$ python test.py
Starting the script...
Using checkpoint: CohereForAI/aya-101
Initializing tokenizer...
Tokenizer initialized successfully.
Initializing model...
Loading checkpoint shards:  36%|████▎       | 4/11 [00:12<00:21,  3.05s/it]
Killed

Nevermind - this issue was caused by my machine not having enough ram
There is no issue
That was embarrassing lol

How much ram did it require and on what platform where you able to get it working?
Thanks.

I think you need at least ~48GB of RAM, but I upgraded my VM to have 64GB RAM (it's on Coreweave) @Beetroit

Alright, thanks

Cohere For AI org

Looks like no issue, thanks for answering questions -- I'm closing this issue.

viraat changed discussion status to closed

Sign up or log in to comment