jojo1899 commited on
Commit
7d93721
·
1 Parent(s): d448458

Improved quantization using Openvino 2024.5.0rc1

Browse files
README.md CHANGED
@@ -7,22 +7,20 @@ tags:
7
 
8
  This is an INT4 quantized version of the `Phi-3-mini-128k-instruct` model. The Python packages used in creating this model are as follows:
9
  ```
10
- openvino==2024.4.0
11
  optimum==1.23.3
12
  optimum-intel==1.20.1
13
  nncf==2.13.0
14
  torch==2.5.1
15
- transformers==4.46.1
16
  ```
17
  This quantized model is created using the following command:
18
  ```
19
- optimum-cli export openvino -m "microsoft/Phi-3-mini-128k-instruct" --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.8 --trust-remote-code ./Phi-3-mini-128k-instruct-ov-int4
20
  ```
21
  For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
22
 
23
  INFO:nncf:Statistics of the bitwidth distribution:
24
  | Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
25
  |----------------|-----------------------------|----------------------------------------|
26
- | 8 | 24% (23 / 130) | 20% (21 / 128) |
27
- | 4 | 76% (107 / 130) | 80% (107 / 128) |
28
-
 
7
 
8
  This is an INT4 quantized version of the `Phi-3-mini-128k-instruct` model. The Python packages used in creating this model are as follows:
9
  ```
10
+ openvino==2024.5.0rc1
11
  optimum==1.23.3
12
  optimum-intel==1.20.1
13
  nncf==2.13.0
14
  torch==2.5.1
15
+ transformers==4.46.2
16
  ```
17
  This quantized model is created using the following command:
18
  ```
19
+ optimum-cli export openvino --model "microsoft/Phi-3-mini-128k-instruct" --weight-format int4 --group-size 128 --sym --ratio 1 --all-layers ./Phi-3-mini-128k-instruct-ov-int4
20
  ```
21
  For more details, run the following command from your Python environment: `optimum-cli export openvino --help`
22
 
23
  INFO:nncf:Statistics of the bitwidth distribution:
24
  | Num bits (N) | % all parameters (layers) | % ratio-defining parameters (layers) |
25
  |----------------|-----------------------------|----------------------------------------|
26
+ | 4 | 100% (130 / 130) | 100% (130 / 130) |
 
 
config.json CHANGED
@@ -133,7 +133,7 @@
133
  "sliding_window": 262144,
134
  "tie_word_embeddings": false,
135
  "torch_dtype": "bfloat16",
136
- "transformers_version": "4.46.1",
137
  "use_cache": true,
138
  "vocab_size": 32064
139
  }
 
133
  "sliding_window": 262144,
134
  "tie_word_embeddings": false,
135
  "torch_dtype": "bfloat16",
136
+ "transformers_version": "4.46.2",
137
  "use_cache": true,
138
  "vocab_size": 32064
139
  }
generation_config.json CHANGED
@@ -7,5 +7,5 @@
7
  32007
8
  ],
9
  "pad_token_id": 32000,
10
- "transformers_version": "4.46.1"
11
  }
 
7
  32007
8
  ],
9
  "pad_token_id": 32000,
10
+ "transformers_version": "4.46.2"
11
  }
openvino_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9ce9a3d5a7f07a4cb34ddae2c2d81c2b6f3cbac5c5cd51e1ba376bcf8983195e
3
- size 2432074068
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edb9df8e7e86999f0acf6b6990ee50ef4fcbd6c4df577ad602f39ca1d797a06f
3
+ size 1970940388
openvino_model.xml CHANGED
The diff for this file is too large to render. See raw diff