3bit-bf16
#1
by
ehartford
- opened
wait, is it 3bit, or bf16? it can't be both, right?
MLX supports mixed precision.
Now you can define the precision of each layer.
And is that what this repo is?
In fact you do need to specify both to be precise always. Usually quantizations are n-bit with activations in fp16, but sometimes it is better for bf16 or fp32 activations if you have a large range of values.
- 3-bit means the weight layers are quantized using 3-bits of precision.
- bf16 means the activations of the model will be in bf16 since the quantization scales, biases and all the non-quantized weight matrices (e.g. the layer norm params) are bf16.
Interesting!
How I would I specify the activation Dtype ?
Even better, how did you make this model ?