bweng commited on
Commit
e16912c
·
verified ·
1 Parent(s): 089ceb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -27
README.md CHANGED
@@ -42,33 +42,35 @@ base_model_relation: quantized
42
 
43
  This is [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2025/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to INT4 by [NNCF](https://github.com/openvinotoolkit/nncf).
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ## Compatibility
46
  The provided OpenVINO™ IR model is compatible with:
47
  * OpenVINO version 2025.2.0 and higher
48
  * Optimum Intel 1.23.0 and higher
49
 
50
- ## Running Model Inference with [Optimum Intel](https://huggingface.co/docs/optimum/intel/index)
51
-
52
- 1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:
53
- ```
54
- pip install optimum[openvino]
55
- ```
56
-
57
- 2. Run model inference:
58
- ```
59
- from transformers import AutoTokenizer
60
- from optimum.intel.openvino import OVModelForCausalLM
61
- model_id = "OpenVINO/Phi-4-mini-instruct-int4-ov"
62
- tokenizer = AutoTokenizer.from_pretrained(model_id)
63
- model = OVModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
64
-
65
- inputs = tokenizer("What is OpenVINO?", return_tensors="pt")
66
- outputs = model.generate(**inputs, max_length=200)
67
- text = tokenizer.batch_decode(outputs)[0]
68
- print(text)
69
- ```
70
- For more examples and possible optimizations, refer to [the Inference with Optimum Intel](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-optimum-intel.html).
71
-
72
  ## Running Model Inference with [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai)
73
 
74
  1. Install packages required for using OpenVINO GenAI.
@@ -81,17 +83,24 @@ pip install huggingface_hub
81
 
82
  ```
83
  import huggingface_hub as hf_hub
84
- model_id = "OpenVINO/Phi-4-mini-instruct-int4-ov"
85
- model_path = "Phi-4-mini-instruct-int4-ov"
86
  hf_hub.snapshot_download(model_id, local_dir=model_path)
87
  ```
88
 
89
  3. Run model inference:
90
  ```
91
  import openvino_genai as ov_genai
92
- device = "CPU"
93
- pipe = ov_genai.LLMPipeline(model_path, device)
94
- print(pipe.generate("What is OpenVINO?", max_length=200))
 
 
 
 
 
 
 
95
  ```
96
  More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html) and [samples](https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#openvino-genai-samples)
97
 
 
42
 
43
  This is [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2025/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to INT4 by [NNCF](https://github.com/openvinotoolkit/nncf).
44
 
45
+ With the following pyproject.yoml
46
+ ```yaml
47
+ [project]
48
+ name = "export"
49
+ version = "0.1.0"
50
+ description = "Export models"
51
+ readme = "README.md"
52
+ requires-python = "==3.12.*"
53
+ dependencies = [
54
+ "openvino==2025.2.0",
55
+ "optimum[openvino]",
56
+ "optimum-intel",
57
+ "openvino-genai",
58
+ "huggingface-hub==0.33.0",
59
+ "tokenizers==0.21.1"
60
+ ]
61
+ ```
62
+
63
+ Then run the export
64
+ ```bash
65
+ uv sync
66
+ uv run optimum-cli export openvino --model microsoft/phi-4-mini-instruct --task text-generation-with-past --weight-format int4 --group-size -1 --ratio 1.0 --sym --trust-remote-code phi-4-mini-instruct/INT4-NPU_compressed_weights
67
+ ```
68
+
69
  ## Compatibility
70
  The provided OpenVINO™ IR model is compatible with:
71
  * OpenVINO version 2025.2.0 and higher
72
  * Optimum Intel 1.23.0 and higher
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ## Running Model Inference with [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai)
75
 
76
  1. Install packages required for using OpenVINO GenAI.
 
83
 
84
  ```
85
  import huggingface_hub as hf_hub
86
+ model_id = "bweng/phi-4-mini-instruct-int4-ov-npu"
87
+ model_path = "phi-4-mini-instruct-int4-ov"
88
  hf_hub.snapshot_download(model_id, local_dir=model_path)
89
  ```
90
 
91
  3. Run model inference:
92
  ```
93
  import openvino_genai as ov_genai
94
+ device = "NPU"
95
+ pipe = ov_genai.LLMPipeline(model_path, "NPU", MAX_PROMPT_LEN=4096, MIN_RESPONSE_LEN=1024, CACHE_DIR="./cache")
96
+
97
+ # Create a proper GenerationConfig object
98
+ gen_config = GenerationConfig(apply_chat_template=True, max_new_tokens=1024)
99
+
100
+ # Now call generate with the correct config object
101
+ output = pipe.generate("How are you doing?", generation_config=gen_config)
102
+ print(output)
103
+
104
  ```
105
  More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html) and [samples](https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#openvino-genai-samples)
106