how is this fp16 when filename has q4?

#1
by ucalyptus - opened

As of today, WebGPU only supports fp16 and fp32 ops (int8 is coming soon). So, the model either runs in fp16 or fp32 mode. The weights, however, are quantized to q4, which are dequantized on the fly and then (in this case) ran in fp16 mode.

Hopefully that clears things up!

Sign up or log in to comment