runtime error

███████▉| 9.98G/9.98G [01:01<00:00, 162MB/s] Downloading shards: 50%|█████ | 1/2 [01:01<01:01, 61.90s/it] model-00002-of-00002.safetensors: 0%| | 0.00/3.50G [00:00<?, ?B/s] model-00002-of-00002.safetensors: 0%| | 10.5M/3.50G [00:07<43:07, 1.35MB/s] model-00002-of-00002.safetensors: 19%|█▉ | 682M/3.50G [00:08<00:26, 105MB/s]  model-00002-of-00002.safetensors: 31%|███ | 1.08G/3.50G [00:12<00:24, 101MB/s] model-00002-of-00002.safetensors: 41%|████ | 1.43G/3.50G [00:13<00:15, 135MB/s] model-00002-of-00002.safetensors: 57%|█████▋ | 2.00G/3.50G [00:15<00:07, 207MB/s] model-00002-of-00002.safetensors: 68%|██████▊ | 2.39G/3.50G [00:23<00:10, 102MB/s] model-00002-of-00002.safetensors: 82%|████████▏ | 2.87G/3.50G [00:24<00:04, 141MB/s] model-00002-of-00002.safetensors: 92%|█████████▏| 3.21G/3.50G [00:26<00:02, 142MB/s] model-00002-of-00002.safetensors: 100%|█████████▉| 3.50G/3.50G [00:26<00:00, 131MB/s] Downloading shards: 100%|██████████| 2/2 [01:29<00:00, 41.48s/it] Downloading shards: 100%|██████████| 2/2 [01:29<00:00, 44.54s/it] Traceback (most recent call last): File "/home/user/app/app.py", line 14, in <module> model = AutoModelForCausalLM.from_pretrained("NousResearch/Llama-2-7b-chat-hf", return_dict=True, load_in_8bit=True, device_map=device_map) File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained return model_class.from_pretrained( File "/home/user/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2733, in from_pretrained raise ValueError( ValueError: If you want to offload some keys to `cpu` or `disk`, you need to set `llm_int8_enable_fp32_cpu_offload=True`. Note that these modules will not be converted to 8-bit but kept in 32-bit.

Container logs:

Fetching error logs...