Intel
/

gpt-neox-japanese-2.7b-int8-inc

Text Generation

neural-compressor

Intel® Neural Compressor

PostTrainingStatic

Model card Files Files and versions Community

gpt-neox-japanese-2.7b-int8-inc / README.md

echarlaix's picture

echarlaix HF staff

update loading instructions

1e61ddd 2 months ago

|

raw history blame contribute delete

No virus

1.34 kB

	---
	language:
	- ja
	license: mit
	tags:
	- ja
	- japanese
	- gpt_neox
	- gpt
	- text-generation
	- lm
	- nlp
	- int8
	- neural-compressor
	- Intel® Neural Compressor
	- PostTrainingStatic
	datasets:
	- oscar
	model-index:
	- name: gpt-neox-japanese-2.7b-int8
	results:
	- task:
	name: Text Generation
	type: text-generation
	dataset:
	name: oscar
	type: oscar
	args: unshuffled_original_ast
	metrics:
	- name: Acurracy
	type: loss
	value: 4.9920
	---
	# INT8 gpt-neox-japanese-2.7b-int8

	## Post-training static quantization

	### PyTorch

	This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

	The original fp32 model comes from the fine-tuned model [abeja/gpt-neox-japanese-2.7b](https://huggingface.co/abeja/gpt-neox-japanese-2.7b).

	The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104.

	#### Test result

	\| \|INT8\|FP32\|
	\|---\|:---:\|:---:\|
	\| Accuracy (eval-loss) \|4.9920\|3.5219\|
	\| Model size (MB) \|2570\|5360\|

	#### Load with Intel® Neural Compressor:

	```python
	from optimum.intel import INCModelForCausalLM

	model_id = "Intel/gpt-neox-japanese-2.7b-int8"
	int8_model = INCModelForCausalLM.from_pretrained(model_id)
	```