runtime error

led version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /home/user/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 CUDA SETUP: Loading binary /home/user/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [02:33<02:33, 153.73s/it] Loading checkpoint shards: 100%|██████████| 2/2 [03:59<00:00, 113.49s/it] Loading checkpoint shards: 100%|██████████| 2/2 [03:59<00:00, 119.52s/it] Traceback (most recent call last): File "/home/user/app/main.py", line 173, in <module> main() File "/home/user/app/main.py", line 67, in main model = AutoModelForCausalLM.from_pretrained( File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained return model_class.from_pretrained( File "/home/user/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2937, in from_pretrained dispatch_model(model, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/accelerate/big_modeling.py", line 387, in dispatch_model raise ValueError( ValueError: You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead.

Container logs:

Fetching error logs...