|
--- |
|
license: mit |
|
pipeline_tag: text-generation |
|
library_name: transformers.js |
|
tags: |
|
- ONNX |
|
- DML |
|
- ONNXRuntime |
|
- nlp |
|
- conversational |
|
--- |
|
|
|
# Phi-3 Mini-4K-Instruct ONNX model for onnxruntime-web |
|
This is the same models as the [official phi3 onnx model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) with a few changes to make it work for onnxruntime-web: |
|
|
|
1. the model is fp16 with int4 block quantization for weights |
|
2. the 'logits' output is fp32 |
|
3. the model uses MHA instead of GQA |
|
4. onnx and external data file need to stay below 2GB to be cacheable in chromium |