runtime error

/usr/local/lib/python3.10/site-packages/langchain/llms/__init__.py:548: LangChainDeprecationWarning: Importing LLMs from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead: `from langchain_community.llms import HuggingFacePipeline`. To install langchain-community run `pip install -U langchain-community`. warnings.warn( Traceback (most recent call last): File "/home/user/app/app.py", line 16, in <module> model = AutoModelForCausalLM.from_pretrained(model_name, File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3928, in from_pretrained model = quantizer.post_init_model(model) File "/usr/local/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 469, in post_init_model raise ValueError( ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object

Container logs:

Fetching error logs...