Longer inference time

#4
by dittops - opened

Inference time seems higher than a normal fp16 model. I was expecting better throughput as the advantage of 1bit models

The advantage of 1 bit models is that they are 32 smaller compared ro 32 bit model. The inference on 1 bit models includes the overhead of dequantization.

However, as per the paper, there is a significant improvement in memory and throughput.

Sign up or log in to comment