runtime error
Exit code: 1. Reason: sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. /usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( Downloading shards: 0%| | 0/2 [00:00<?, ?it/s][A Downloading shards: 50%|βββββ | 1/2 [00:20<00:20, 20.18s/it][A Downloading shards: 100%|ββββββββββ| 2/2 [00:29<00:00, 14.00s/it][A Downloading shards: 100%|ββββββββββ| 2/2 [00:29<00:00, 14.93s/it] Traceback (most recent call last): File "/home/user/app/app.py", line 1, in <module> from demo.demo import app File "/home/user/app/demo/demo.py", line 7, in <module> BINO = Binoculars(quantize=True) File "/home/user/app/binoculars/detector.py", line 55, in __init__ self.observer_model = AutoModelForCausalLM.from_pretrained(observer_name_or_path, **model_kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2842, in from_pretrained raise ValueError( ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.
Container logs:
Fetching error logs...