Missing quant_config.json

#8
by ThWu - opened

I'm setting up arctic vllm endpoint following the tutorial https://github.com/Snowflake-Labs/snowflake-arctic/tree/main/inference/vllm. However, I was not able to enable the quantization="deepspeedfp" due to ValueError: Cannot find the config file for deepspeedfp, result in OOM with even 8 A100s
The fix is to add the quant_config.json into the model dir:

    "bits": 8,
    "rounding": "nearest",
    "mantissa_bits": 3,
    "group_size": 512
}```
Could you guys upload it?

Sign up or log in to comment