What is the difference between "cpu-int4-rtn-block-32-acc-level-4" and "cpu-int4-rtn-block-32"?

#3
by Zhubarb - opened

I understand both are aimed at CPU and mobile, what does "acc-level-4" stand for and what does it do?
The onnx files seem to be the same size, which one should we use when? I could not find details on the model card. Thanks in advance.

Microsoft org

ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. With accuracy level = 1 and accuracy level = 4. If better performance with a minor trade-off in accuracy (for example on mobile devices), we recommend using the model with acc-level-4.

gugarosa changed discussion status to closed

Sign up or log in to comment