Nice model, any info on scripts used to quantize?

#1
by RonanMcGovern - opened

and also commands for running with vLLM? Thanks

Just pass the stub to vLLM and it will run.

For the scripts, we have a bunch of examples in the vllm-project/llm-compressor repo for fp8. Just swap in the Llama 3.3 HF stub and youre good to go.

mgoin changed discussion status to closed

Sign up or log in to comment