runtime error
ing shards: 43%|βββββ | 3/7 [00:31<00:43, 10.78s/it][A Downloading shards: 57%|ββββββ | 4/7 [00:44<00:34, 11.48s/it][A Downloading shards: 71%|ββββββββ | 5/7 [00:58<00:25, 12.51s/it][A Downloading shards: 86%|βββββββββ | 6/7 [01:07<00:11, 11.24s/it][A Downloading shards: 100%|ββββββββββ| 7/7 [01:17<00:00, 10.79s/it][A Downloading shards: 100%|ββββββββββ| 7/7 [01:17<00:00, 11.06s/it] You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s][A Loading checkpoint shards: 14%|ββ | 1/7 [00:07<00:44, 7.38s/it][A Loading checkpoint shards: 29%|βββ | 2/7 [00:11<00:28, 5.70s/it][A Loading checkpoint shards: 43%|βββββ | 3/7 [00:13<00:15, 3.93s/it][A Loading checkpoint shards: 57%|ββββββ | 4/7 [00:16<00:10, 3.56s/it][A Loading checkpoint shards: 71%|ββββββββ | 5/7 [00:19<00:06, 3.17s/it][A Loading checkpoint shards: 86%|βββββββββ | 6/7 [00:20<00:02, 2.63s/it][A Loading checkpoint shards: 100%|ββββββββββ| 7/7 [00:22<00:00, 2.44s/it][A Loading checkpoint shards: 100%|ββββββββββ| 7/7 [00:22<00:00, 3.26s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/home/user/app/app_dialogue.py", line 269, in <module> def model_inference( File "/usr/local/lib/python3.10/site-packages/spaces/zero/decorator.py", line 113, in _GPU client.startup_report() File "/usr/local/lib/python3.10/site-packages/spaces/zero/client.py", line 45, in startup_report raise RuntimeError("Error while initializing ZeroGPU: Unknown") RuntimeError: Error while initializing ZeroGPU: Unknown
Container logs:
Fetching error logs...