YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
run phi3-mini on AMD NPU
- If no
phi3_mini_awq_4bit_no_flash_attention.pt
, use awq quantization to get the quantization model. - Put modeling_phi3.py in this repo into the phi-3-mini folder.
- Modify the file path in the run_awq.py
- run
python run_awq.py --task decode --target aie --w_bit 4
reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers
As the quantization of phi-3, may refer to https://github.com/mit-han-lab/llm-awq/pull/183
PS: The performance is similar to that on CPU(7640hs).
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.