--- language: - ja license: mit tags: - ja - japanese - gpt_neox - gpt - text-generation - lm - nlp - int8 - neural-compressor - Intel® Neural Compressor - PostTrainingStatic datasets: - oscar model-index: - name: gpt-neox-japanese-2.7b-int8 results: - task: name: Text Generation type: text-generation dataset: name: oscar type: oscar args: unshuffled_original_ast metrics: - name: Acurracy type: loss value: 4.9920 --- # INT8 gpt-neox-japanese-2.7b-int8 ## Post-training static quantization ### PyTorch This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor). The original fp32 model comes from the fine-tuned model [abeja/gpt-neox-japanese-2.7b](https://huggingface.co/abeja/gpt-neox-japanese-2.7b). The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104. #### Test result | |INT8|FP32| |---|:---:|:---:| | **Accuracy (eval-loss)** |4.9920|3.5219| | **Model size (MB)** |2570|5360| #### Load with Intel® Neural Compressor: ```python from optimum.intel import INCModelForCausalLM model_id = "Intel/gpt-neox-japanese-2.7b-int8" int8_model = INCModelForCausalLM.from_pretrained(model_id) ```