Edit model card

converted to onnx from here https://huggingface.co/microsoft/phi-2 fp16 with weights block quantized to int4 whenever visible. Last 3 layers are kept in fp32 to avoid fp16 overflow. Some outputs are casted to fp32 to make it friendly for ort-web.

Downloads last month
5