runtime error

74%|███████▎ | 14/19 [00:48<00:17, 3.52s/it] Downloading shards: 79%|███████▉ | 15/19 [00:52<00:14, 3.54s/it] Downloading shards: 84%|████████▍ | 16/19 [00:56<00:11, 3.77s/it] Downloading shards: 89%|████████▉ | 17/19 [01:00<00:07, 3.89s/it] Downloading shards: 95%|█████████▍| 18/19 [01:04<00:03, 3.90s/it] Downloading shards: 100%|██████████| 19/19 [01:07<00:00, 3.53s/it] Downloading shards: 100%|██████████| 19/19 [01:07<00:00, 3.54s/it] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/user/app/app.py", line 26, in <module> model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config = quantization_config, device_map="cuda", trust_remote_code=True) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3531, in from_pretrained ) = cls._load_pretrained_model( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3958, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 814, in _load_state_dict_into_meta_model hf_*********.create_quantized_param(model, param, param_name, param_device, state_dict, unexpected_keys) File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 193, in create_quantized_param raise ValueError( ValueError: Supplied state dict for transformer.decoder_layer.0.moe.0.linear.weight does not contain `bitsandbytes__*` and possibly other `quantized_stats` components.

Container logs:

Fetching error logs...