AWQ Version of MTP

by devops724 - opened 23 days ago

Hi there
i see lots of AWQ version of Gemma-4-31B-it , but for MTP version i can't find even single AWQ version
is there any issue on create AWQ version of this model ?

trieudemo11

20 days ago

If you use vLLM it can quantize the model/kv_cache on the fly using
llm = LLM( model=CHECKPOINT, trust_remote_code=True, gpu_memory_utilization=0.90, max_model_len=MAX_MODEL_LEN, seed=0, disable_log_stats=True, enable_chunked_prefill=True, enable_prefix_caching=True, quantization="fp8", kv_cache_dtype="fp8", )

devops724

20 days ago

If you use vLLM it can quantize the model/kv_cache on the fly using
llm = LLM( model=CHECKPOINT, trust_remote_code=True, gpu_memory_utilization=0.90, max_model_len=MAX_MODEL_LEN, seed=0, disable_log_stats=True, enable_chunked_prefill=True, enable_prefix_caching=True, quantization="fp8", kv_cache_dtype="fp8", )
thanks @trieudemo11
but awq give us accuracy similar to fp8 on int4 , or i missed something, i also use vllm serve command to create openai api campatible endpoints

devops724

10 days ago

no update?

trieudemo11

4 days ago

Depending on your task, you must test by yourself on your dataset.
For general tasks like I do, I found int4 quality is acceptable, but definitely not superior than fp8. For instruction-following tasks, int4 does well, but the answer is always kinda lazy unless I ask it to do more. For fp8, the output is more balanced between content and accuracy.

devops724

3 days ago

Depending on your task, you must test by yourself on your dataset.
For general tasks like I do, I found int4 quality is acceptable, but definitely not superior than fp8. For instruction-following tasks, int4 does well, but the answer is always kinda lazy unless I ask it to do more. For fp8, the output is more balanced between content and accuracy.

but there ire only fp8 and gguf version, there is no awq version

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment