Nice model, any info on scripts used to quantize?
#1
by
RonanMcGovern
- opened
and also commands for running with vLLM? Thanks
Just pass the stub to vLLM and it will run.
For the scripts, we have a bunch of examples in the vllm-project/llm-compressor repo for fp8. Just swap in the Llama 3.3 HF stub and youre good to go.
mgoin
changed discussion status to
closed