runtime error

G [02:14<00:01, 44.2MB/s] Downloading model.safetensors: 99%|█████████▉| 7.20G/7.26G [02:15<00:01, 39.3MB/s] Downloading model.safetensors: 99%|█████████▉| 7.21G/7.26G [02:15<00:00, 46.0MB/s] Downloading model.safetensors: 100%|█████████▉| 7.22G/7.26G [02:15<00:00, 37.7MB/s] Downloading model.safetensors: 100%|█████████▉| 7.24G/7.26G [02:16<00:00, 32.6MB/s] Downloading model.safetensors: 100%|█████████▉| 7.26G/7.26G [02:16<00:00, 42.4MB/s] Downloading model.safetensors: 100%|██████████| 7.26G/7.26G [02:16<00:00, 53.2MB/s] skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet. Traceback (most recent call last): File "/home/user/app/app.py", line 4, in <module> app = create_app() ^^^^^^^^^^^^ File "/home/user/app/chatBot/__init__.py", line 25, in create_app from chatBot.resources.routes import resources File "/home/user/app/chatBot/resources/routes.py", line 3, in <module> from chatBot.common.utils import getAnswerLlama, getAnswerGpt File "/home/user/app/chatBot/common/utils.py", line 2, in <module> from chatBot.common.llama import llamaModel File "/home/user/app/chatBot/common/llama.py", line 35, in <module> model = AutoGPTQForCausalLM.from_quantized( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/auto_gptq/modeling/auto.py", line 108, in from_quantized return quant_func( ^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 923, in from_quantized model = autogptq_post_init(model, use_act_order=quantize_config.desc_act) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/auto_gptq/modeling/_utils.py", line 258, in autogptq_post_init prepare_buffers(device, buffers["temp_state"], buffers["temp_dq"]) RuntimeError: no device index

Container logs:

Fetching error logs...