BoruiXu
/

phi3_mini_amd_NPU

Model card Files Files and versions Community

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

run phi3-mini on AMD NPU

If no phi3_mini_awq_4bit_no_flash_attention.pt, use awq quantization to get the quantization model.
Put modeling_phi3.py in this repo into the phi-3-mini folder.
Modify the file path in the run_awq.py
run python run_awq.py --task decode --target aie --w_bit 4

reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers

As the quantization of phi-3, may refer to https://github.com/mit-han-lab/llm-awq/pull/183

PS: The performance is similar to that on CPU(7640hs).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.