NeMo

qnemo file

#9
by willy1212009 - opened

did anyone do the PTQ from nemo-framework to get nemotron-340b fp8/int4 qnemo file? it should use 16H100 or 8H200 to convert, but we dont have this equipment QQ.
but it's weird that we want use quantize but it need 16H100 first lol.
in paper, it show if use quantize, only need 8H100

https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/ptq.html

NVIDIA org

Same as for the base model, there's some quantization work in progress (but not sure about int4) that will be shared once full validated.

Sign up or log in to comment