violetch24 commited on
Commit
f631890
1 Parent(s): b0ca7da

Update README.md

Browse files

create model card

Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -1,3 +1,61 @@
1
  ---
 
 
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ja
4
  license: mit
5
+ tags:
6
+ - ja
7
+ - japanese
8
+ - gpt_neox
9
+ - gpt
10
+ - text-generation
11
+ - lm
12
+ - nlp
13
+ - int8
14
+ - neural-compressor
15
+ - Intel® Neural Compressor
16
+ - PostTrainingStatic
17
+ datasets:
18
+ - oscar
19
+ model-index:
20
+ - name: gpt-neox-japanese-2.7b-int8
21
+ results:
22
+ - task:
23
+ name: Text Generation
24
+ type: text-generation
25
+ dataset:
26
+ name: oscar
27
+ type: oscar
28
+ args: unshuffled_original_ast
29
+ metrics:
30
+ - name: Acurracy
31
+ type: loss
32
+ value: 4.9920
33
  ---
34
+ # INT8 gpt-neox-japanese-2.7b-int8
35
+
36
+ ## Post-training static quantization
37
+
38
+ ### PyTorch
39
+
40
+ This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
41
+
42
+ The original fp32 model comes from the fine-tuned model [abeja/gpt-neox-japanese-2.7b](https://huggingface.co/abeja/gpt-neox-japanese-2.7b).
43
+
44
+ The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104.
45
+
46
+ #### Test result
47
+
48
+ | |INT8|FP32|
49
+ |---|:---:|:---:|
50
+ | **Accuracy (eval-loss)** |4.9920|3.5219|
51
+ | **Model size (MB)** |2570|5360|
52
+
53
+ #### Load with Intel® Neural Compressor:
54
+
55
+ ```python
56
+ from optimum.intel.neural_compressor.quantization import IncQuantizedModelForCausalLM
57
+
58
+ int8_model = IncQuantizedModelForCausalLM.from_pretrained(
59
+ "Intel/gpt-neox-japanese-2.7b-int8",
60
+ )
61
+ ```