Qwen
/

Qwen1.5-14B-Chat-GPTQ-Int4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

can't run with fastchat cuda 12.1

#1

by jaywanghz - opened Feb 19

Feb 19

I can't not run with bot GQPTQ and AWQ models of 14b.
Both shows error as below:
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\model_adapter.py", line 281, in load_model
2024-02-19 19:46:11 | ERROR | stderr | model, tokenizer = adapter.load_compress_model(
2024-02-19 19:46:11 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\model_adapter.py", line 115, in load_compress_model
2024-02-19 19:46:11 | ERROR | stderr | return load_compress_model(
2024-02-19 19:46:11 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 216, in load_compress_model
2024-02-19 19:46:11 | ERROR | stderr | apply_compressed_weight(model, compressed_state_dict, device)
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 104, in apply_compressed_weight
2024-02-19 19:46:11 | ERROR | stderr | apply_compressed_weight(
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 104, in apply_compressed_weight
2024-02-19 19:46:11 | ERROR | stderr | apply_compressed_weight(
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 104, in apply_compressed_weight
2024-02-19 19:46:11 | ERROR | stderr | apply_compressed_weight(
2024-02-19 19:46:11 | ERROR | stderr | [Previous line repeated 1 more time]
2024-02-19 19:46:11 | ERROR | stderr | File "G:\ProgramData\Anaconda3\envs\chatchat210\Lib\site-packages\fastchat\model\compression.py", line 99, in apply_compressed_weight
2024-02-19 19:46:11 | ERROR | stderr | compressed_state_dict[full_name], target_attr.bias, target_device
2024-02-19 19:46:11 | ERROR | stderr | ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
2024-02-19 19:46:11 | ERROR | stderr | KeyError: 'model.layers.0.self_attn.k_proj.weight'

Feb 19

i understand the int4 files weights should be qweight not weight but just don't know how to solve the problem. Let me know the solution if there is, thanks.

jklj077

Qwen org Feb 20

See https://github.com/lm-sys/FastChat/tree/main?tab=readme-ov-file#more-platforms-and-quantization for how to load quantized models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment