how is this fp16 when filename has q4?

by ucalyptus - opened about 1 month ago

Discussion

ucalyptus

about 1 month ago

@Xenova

Xenova

Owner about 1 month ago

As of today, WebGPU only supports fp16 and fp32 ops (int8 is coming soon). So, the model either runs in fp16 or fp32 mode. The weights, however, are quantized to q4, which are dequantized on the fly and then (in this case) ran in fp16 mode.

Hopefully that clears things up!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment