microsoft/Phi-3-mini-128k-instruct-onnx · Where is the model? 0 downloads means nobody can use it. Please fix.

Apr 27

Where is the model? 0 downloads means nobody can use it. Please fix.

Apr 28

Here are the models: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx/tree/main

This format of offering models is not supported for Download Count (it's not an official format). But it doesn't mean they are not there, they are all there.

tlapusan

Apr 29

•

edited Apr 29

I have this error when loading the model "OSError: microsoft/Phi-3-mini-128k-instruct-onnx does not appear to have a file named configuration_phi3.py. Checkout 'https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx/main' for available files."

I'm doing something wrong ?

Code:
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-128k-instruct-onnx",
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
max_length=200
)

transformers version : 4.41.0.dev0 (pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers)

MaziyarPanahi

Apr 29

•

edited Apr 29

@tlapusan You cannot load ONNX models via AutoModelForCausalLM.from_pretrained, you must use ONNX-Runtime or any other libraries that can load ONNX models. (AutoModelForCausalLM loads the models compatible with HuggingFace, you should use: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)

PS: I might be wrong and they might have added ONNX support to transformers like in optimum.

tlapusan

Apr 29

•

edited Apr 29

Thanks @MaziyarPanahi , I tried with optimum and the same error.
model = ORTModelForCausalLM.from_pretrained(
"optimum/Phi-3-mini-128k-instruct-onnx",
torch_dtype="auto",
trust_remote_code=True,
max_length=200
)

I found this resource, https://onnxruntime.ai/blogs/accelerating-phi-3, but it seems they support only Windows :))

kvaishnavi

Microsoft org Apr 29

•

edited May 10

In addition to ONNX Runtime GenAI, the uploaded ONNX models work directly with Hugging Face's Optimum as well for all platforms. Here's how you can integrate with Optimum (v1.19.2 or newer is required).

model = ORTModelForCausalLM.from_pretrained(
    "/path/to/downloaded/folder/containing/onnx/model",
    decoder_file_name="filename_in_folder.onnx",
    decoder_with_past_file_name="filename_in_folder.onnx",
    use_merged=True,
    provider="name of execution provider (e.g. CPUExecutionProvider, CUDAExecutionProvider)",
    trust_remote_code=True,
    local_files_only=True,
)

Example:

# Download the ONNX models
git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx

# Load ONNX model
from optimum.onnxruntime import ORTModelForCausalLM
model = ORTModelForCausalLM.from_pretrained(
    "./Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4",
    decoder_file_name="phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx",
    decoder_with_past_file_name="phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx",
    use_merged=True,
    provider="CPUExecutionProvider",
    trust_remote_code=True,
    local_files_only=True,
)

Then you can use the same APIs that Hugging Face's Transformers has (e.g. model.generate).

anandhperumal

Apr 29

@kvaishnavi

There is no config file in the model folder. Did you modify the genai_config.json file?

>>> from optimum.onnxruntime import ORTModelForCausalLM

>>> model = ORTModelForCausalLM.from_pretrained(
...     "./Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4",
...     decoder_file_name="phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx",
...     decoder_with_past_file_name="phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx",
...     use_merged=True,
...     provider="CPUExecutionProvider",
...     trust_remote_code=True,
...     local_files_only=True,
... )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/anaconda3/envs/ms_onxx/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 669, in from_pretrained
    return super().from_pretrained(
  File "/home/ubuntu/anaconda3/envs/ms_onxx/lib/python3.10/site-packages/optimum/modeling_base.py", line 371, in from_pretrained
    raise OSError(f"config.json not found in {model_id} local folder")
OSError: config.json not found in ./Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 local folder

kvaishnavi

Microsoft org Apr 30

The config.json file is at the root of the repo. I have added a copy of that file as well as a copy of configuration_phi3.py in each sub-folder for convenience so that the sub-folders can directly load with Optimum. I have also edited the above instructions to include one additional change needed in Optimum.

With the above instructions, the model loads.

>>> from optimum.onnxruntime import ORTModelForCausalLM
>>> model = ORTModelForCausalLM.from_pretrained("./Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4", decoder_file_name="phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx", decoder_with_past_file_name="phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx", use_merged=True, provider="CPUExecutionProvider", trust_remote_code=True, local_files_only=True)
The argument `trust_remote_code` is to be used along with export=True. It will be ignored.
The `decoder_file_name` argument is deprecated, please use `file_name` instead.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>>

The genai_config.json file is used in ONNX Runtime GenAI.

tlapusan

Apr 30

•

edited Apr 30

Thanks @kvaishnavi for you posts. I followed them, but when trying to load the model I have this error :

InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/k_proj/MatMul_Q4", MatMulNBits, "com.microsoft", -1) : ("/model/layers.0/input_layernorm/output_0": tensor(float),"model.layers.0.attn.k_proj.MatMul.weight_Q4": tensor(uint8),"model.layers.0.attn.k_proj.MatMul.weight_scales": tensor(float),) -> ("/model/layers.0/attn/k_proj/MatMul/output_0": tensor(float),) , Error Unrecognized attribute: accuracy_level for operator MatMulNBits

I'm running the code from a Sagemaker notebook (CPU). Tried also the Phi-3-mini-4k-instruct-onnx, same error.
Another thing, I needed to install git-lfs to download the big files while cloning the repo.

PS. tried also on my laptop, the same error:
onnx==1.16.0
onnxruntime==1.16.3
-e git+https://github.com/huggingface/optimum@e3fd2776a318a3a7b9d33315cc42c04c181f6d2f#egg=optimum

kvaishnavi

Microsoft org Apr 30

•

edited Apr 30

To run with Optimum, you will need to upgrade your ONNX Runtime (ORT) version to a nightly version until ORT 1.18 is officially released in early May. Here are the instructions to install the nightly version of ORT.

ORT nightly CPU package:

# Uninstall any existing ORT packages
$ pip uninstall -y onnxruntime onnxruntime-gpu ort-nightly ort-nightly-gpu

# Install ORT nightly CPU package
$ pip install ort-nightly --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/

ORT nightly GPU package with CUDA 11.X:

# Uninstall any existing ORT packages
$ pip uninstall -y onnxruntime onnxruntime-gpu ort-nightly ort-nightly-gpu

# Install ORT nightly GPU package
$ pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/

ORT nightly GPU package with CUDA 12.X:

# Uninstall any existing ORT packages
$ pip uninstall -y onnxruntime onnxruntime-gpu ort-nightly ort-nightly-gpu

# Install ORT nightly GPU package
$ pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/

parinitarahi changed discussion status to closed May 1

tlapusan

May 7

Thanks @kvaishnavi , It worked !