Intel
/

gpt-neox-japanese-2.7b-int8-inc

Text Generation

neural-compressor

Intel® Neural Compressor

PostTrainingStatic

Model card Files Files and versions Community

violetch24 commited on May 9, 2023

Commit

f631890

•

1 Parent(s): b0ca7da

Update README.md

create model card

Files changed (1) hide show

README.md +58 -0

README.md CHANGED Viewed

@@ -1,3 +1,61 @@
 ---
 license: mit
 ---

 ---
+language:
+- ja
 license: mit
+tags:
+- ja
+- japanese
+- gpt_neox
+- gpt
+- text-generation
+- lm
+- nlp
+- int8
+- neural-compressor
+- Intel® Neural Compressor
+- PostTrainingStatic
+datasets:
+- oscar
+model-index:
+- name: gpt-neox-japanese-2.7b-int8
+  results:
+  - task:
+      name: Text Generation
+      type: text-generation
+    dataset:
+      name: oscar
+      type: oscar
+      args: unshuffled_original_ast
+    metrics:
+    - name: Acurracy
+      type: loss
+      value: 4.9920
 ---
+# INT8 gpt-neox-japanese-2.7b-int8
+## Post-training static quantization
+### PyTorch
+This is an INT8  PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
+The original fp32 model comes from the fine-tuned model [abeja/gpt-neox-japanese-2.7b](https://huggingface.co/abeja/gpt-neox-japanese-2.7b).
+The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104.
+#### Test result
+|   |INT8|FP32|
+|---|:---:|:---:|
+| **Accuracy (eval-loss)** |4.9920|3.5219|
+| **Model size (MB)**  |2570|5360|
+#### Load with Intel® Neural Compressor:
+```python
+from optimum.intel.neural_compressor.quantization import IncQuantizedModelForCausalLM
+int8_model = IncQuantizedModelForCausalLM.from_pretrained(
+    "Intel/gpt-neox-japanese-2.7b-int8",
+)
+```