Intel
/

bert-base-uncased-finetuned-swag-int8-static-inc

Multiple Choice

Intel® Neural Compressor

PostTrainingStatic

Inference Endpoints

Model card Files Files and versions Community

bert-base-uncased-finetuned-swag-int8-static-inc / README.md

xinhe's picture

Update README.md

87b1d6e about 2 years ago

|

raw history blame

No virus

1.9 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- multiple-choice
	- int8
	- PostTrainingStatic
	datasets:
	- swag
	metrics:
	- accuracy
	model-index:
	- name: bert-base-uncased-finetuned-swag-int8-static
	results:
	- task:
	name: Multiple-choice
	type: multiple-choice
	dataset:
	name: Swag
	type: swag
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.7838148474693298
	---
	# INT8 bert-base-uncased-finetuned-swag

	### Post-training static quantization

	This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

	The original fp32 model comes from the fine-tuned model [thyagosme/bert-base-uncased-finetuned-swag](https://huggingface.co/thyagosme/bert-base-uncased-finetuned-swag).

	The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104.

	The linear modules bert.encoder.layer.2.output.dense, bert.encoder.layer.5.intermediate.dense, bert.encoder.layer.9.output.dense, bert.encoder.layer.10.output.dense fall back to fp32 to meet the 1% relative accuracy loss.

	### Test result

	- Batch size = 8
	- [Amazon Web Services](https://aws.amazon.com/) c6i.xlarge (Intel ICE Lake: 4 vCPUs, 8g Memory) instance.

	\| \|INT8\|FP32\|
	\|---\|:---:\|:---:\|
	\| Throughput (samples/sec) \|16.55\|9.333\|
	\| Accuracy (eval-accuracy) \|0.7838\|0.7915\|
	\| Model size (MB) \|133\|418\|

	### Load with Intel® Neural Compressor (build from source):

	```python
	from neural_compressor.utils.load_huggingface import OptimizedModel
	int8_model = OptimizedModel.from_pretrained(
	'Intel/bert-base-uncased-finetuned-swag-int8-static',
	)
	```

	Notes:
	- The INT8 model has better performance than the FP32 model when the CPU is fully occupied. Otherwise, there will be the illusion that INT8 is inferior to FP32.