can't run with fastchat cuda 12.1
I can't not run with bot GQPTQ and AWQ models of 14b.
Both shows error as below:
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\model_adapter.py", line 281, in load_model
2024-02-19 19:46:11 | ERROR | stderr | model, tokenizer = adapter.load_compress_model(
2024-02-19 19:46:11 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\model_adapter.py", line 115, in load_compress_model
2024-02-19 19:46:11 | ERROR | stderr | return load_compress_model(
2024-02-19 19:46:11 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 216, in load_compress_model
2024-02-19 19:46:11 | ERROR | stderr | apply_compressed_weight(model, compressed_state_dict, device)
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 104, in apply_compressed_weight
2024-02-19 19:46:11 | ERROR | stderr | apply_compressed_weight(
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 104, in apply_compressed_weight
2024-02-19 19:46:11 | ERROR | stderr | apply_compressed_weight(
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 104, in apply_compressed_weight
2024-02-19 19:46:11 | ERROR | stderr | apply_compressed_weight(
2024-02-19 19:46:11 | ERROR | stderr | [Previous line repeated 1 more time]
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 99, in apply_compressed_weight
2024-02-19 19:46:11 | ERROR | stderr | compressed_state_dict[full_name], target_attr.bias, target_device
2024-02-19 19:46:11 | ERROR | stderr | ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
2024-02-19 19:46:11 | ERROR | stderr | KeyError: 'model.layers.0.self_attn.k_proj.weight'
i understand the int4 files weights should be qweight not weight but just don't know how to solve the problem. Let me know the solution if there is, thanks.
See https://github.com/lm-sys/FastChat/tree/main?tab=readme-ov-file#more-platforms-and-quantization for how to load quantized models.