Loading checkpoint shards: 3/11 fails?

by zsxkib - opened Feb 13, 2024

Feb 13, 2024

The sample code consistently fails, not sure why:

tokenizer_config.json:   0%|          | 0.00/833 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████| 833/833 [00:00<00:00, 2.52MB/s]
tokenizer.json:   0%|          | 0.00/16.3M [00:00<?, ?B/s]
tokenizer.json: 100%|██████████| 16.3M/16.3M [00:00<00:00, 213MB/s]
config.json:   0%|          | 0.00/836 [00:00<?, ?B/s]
config.json: 100%|██████████| 836/836 [00:00<00:00, 3.90MB/s]
model.safetensors.index.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]
model.safetensors.index.json: 100%|██████████| 50.6k/50.6k [00:00<00:00, 112MB/s]
Downloading shards:   0%|          | 0/11 [00:00<?, ?it/s]
model-00001-of-00011.safetensors: 100%|██████████| 4.94G/4.94G [00:20<00:00, 237MB/s]
Downloading shards:   9%|▉         | 1/11 [00:20<03:28, 20.87s/it]
model-00002-of-00011.safetensors: 100%|██████████| 5.00G/5.00G [00:21<00:00, 229MB/s]
Downloading shards:  18%|█▊        | 2/11 [00:42<03:13, 21.47s/it]
model-00003-of-00011.safetensors: 100%|██████████| 4.97G/4.97G [00:19<00:00, 252MB/s]
Downloading shards:  27%|██▋       | 3/11 [01:02<02:45, 20.70s/it]
model-00004-of-00011.safetensors: 100%|██████████| 4.90G/4.90G [00:22<00:00, 220MB/s]
Downloading shards:  36%|███▋      | 4/11 [01:24<02:29, 21.36s/it]
model-00005-of-00011.safetensors: 100%|██████████| 4.97G/4.97G [00:22<00:00, 223MB/s]
Downloading shards:  45%|████▌     | 5/11 [01:47<02:10, 21.70s/it]
model-00006-of-00011.safetensors: 100%|██████████| 4.97G/4.97G [00:20<00:00, 238MB/s]
Downloading shards:  55%|█████▍    | 6/11 [02:08<01:47, 21.46s/it]
model-00007-of-00011.safetensors: 100%|██████████| 4.87G/4.87G [00:20<00:00, 235MB/s]
Downloading shards:  64%|██████▎   | 7/11 [02:29<01:24, 21.24s/it]
model-00008-of-00011.safetensors: 100%|██████████| 5.00G/5.00G [00:21<00:00, 229MB/s]
Downloading shards:  73%|███████▎  | 8/11 [02:51<01:04, 21.50s/it]
model-00009-of-00011.safetensors: 100%|██████████| 5.00G/5.00G [00:21<00:00, 236MB/s]
Downloading shards:  82%|████████▏ | 9/11 [03:12<00:42, 21.48s/it]
model-00010-of-00011.safetensors: 100%|██████████| 2.99G/2.99G [00:09<00:00, 329MB/s]
Downloading shards:  91%|█████████ | 10/11 [03:21<00:17, 17.66s/it]
model-00011-of-00011.safetensors: 100%|██████████| 4.10G/4.10G [00:12<00:00, 325MB/s]
Downloading shards: 100%|██████████| 11/11 [03:34<00:00, 16.13s/it]
Downloading shards: 100%|██████████| 11/11 [03:34<00:00, 19.48s/it]
Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]
Loading checkpoint shards:   9%|▉         | 1/11 [00:03<00:31,  3.14s/it]
Loading checkpoint shards:  18%|█▊        | 2/11 [00:07<00:34,  3.80s/it]
Loading checkpoint shards:  27%|██▋       | 3/11 [00:11<00:31,  3.97s/it]

Extra details:

# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
  # set to true if your model requires a GPU
  gpu: true
  cuda: "11.8" # NOTE I have tried 12.1 too, same issue shard 3/11 seems broken?

  # a list of ubuntu apt packages to install
  system_packages:
    - "libgl1-mesa-glx"
    - "libglib2.0-0"

  # python version in the form '3.11' or '3.11.4'
  python_version: "3.11"

  # a list of packages in the format <package-name>==<version>
  python_packages:
    - "torch"
    - "torchvision"
    - "torchaudio"
    - "transformers"

  # commands run after the environment is setup
  run:
    - pip install transformers
    - curl -o /usr/local/bin/pget -L "https://github.com/replicate/pget/releases/download/v0.6.1/pget_linux_x86_64" && chmod +x /usr/local/bin/pget

# predict.py defines how predictions are run on your model
predict: "predict.py:Predictor"

Any suggestions?

zsxkib

Feb 13, 2024

Some further tests:

import os

os.system("pip install -q transformers")
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

print("Starting the script...")

checkpoint = "CohereForAI/aya-101"
print(f"Using checkpoint: {checkpoint}")

print("Initializing tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
print("Tokenizer initialized successfully.")

print("Initializing model...")
aya_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
print("Model initialized successfully.")

# Turkish to English translation
print("Preparing Turkish to English translation...")
tur_inputs = tokenizer.encode(
    "Translate to English: Aya cok dilli bir dil modelidir.", return_tensors="pt"
)
print("Turkish input encoded.")

print("Generating Turkish to English translation...")
tur_outputs = aya_model.generate(tur_inputs, max_new_tokens=128)
print("Translation generated.")

print("Decoding output...")
print(tokenizer.decode(tur_outputs[0]))
print("Turkish to English translation completed.")
# Aya is a multi-lingual language model

# Q: Why are there so many languages in India?
print("Preparing question about languages in India...")
hin_inputs = tokenizer.encode("भारत में इतनी सारी भाषाएँ क्यों हैं?", return_tensors="pt")
print("Indian languages question encoded.")

print("Generating answer for the question about languages in India...")
hin_outputs = aya_model.generate(hin_inputs, max_new_tokens=128)
print("Answer generated.")

print("Decoding output for the question about languages in India...")
print(tokenizer.decode(hin_outputs[0]))
print("Question about languages in India processed.")
# Expected output: भारत में कई भाषाएँ हैं और विभिन्न भाषाओं के बोली जाने वाले लोग हैं। यह विभिन्नता भाषाई विविधता और सांस्कृतिक विविधता का परिणाम है। Translates to "India has many languages and people speaking different languages. This diversity is the result of linguistic diversity and cultural diversity."

This code (the sample code you gave + prints) seems to fail at tokenizer = AutoTokenizer.from_pretrained(checkpoint):

$ python test.py
Starting the script...
Using checkpoint: CohereForAI/aya-101
Initializing tokenizer...
Tokenizer initialized successfully.
Initializing model...
Loading checkpoint shards:  36%|████▎       | 4/11 [00:12<00:21,  3.05s/it]
Killed

zsxkib

Feb 13, 2024

•

edited Feb 13, 2024

Nevermind - this issue was caused by my machine not having enough ram
There is no issue
That was embarrassing lol