phi2-int4 / README.md
schmuell's picture
Update README.md
a71a5e3 verified
|
raw
history blame contribute delete
No virus
267 Bytes
---
license: mit
---
converted to onnx from here https://huggingface.co/microsoft/phi-2
fp16 with weights block quantized to int4 whenever visible.
Last 3 layers are kept in fp32 to avoid fp16 overflow.
Some outputs are casted to fp32 to make it friendly for ort-web.