Intel
/

MiniLM-L12-H384-uncased-mrpc-int8-static-inc

Text Classification

text-classfication

Intel® Neural Compressor

PostTrainingStatic

Inference Endpoints

Model card Files Files and versions Community

MiniLM-L12-H384-uncased-mrpc-int8-static-inc / README.md

echarlaix's picture

echarlaix HF staff

fix README

70a58f5 3 months ago

|

raw history blame contribute delete

No virus

1.97 kB

	---
	language: en
	license: mit
	tags:
	- text-classfication
	- int8
	- Intel® Neural Compressor
	- PostTrainingStatic
	datasets:
	- mrpc
	metrics:
	- f1
	---

	# INT8 MiniLM-L12-H384 finetuned MRPC

	## Post-training static quantization

	### PyTorch

	This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
	The original fp32 model comes from the fine-tuned model [Intel/MiniLM-L12-H384-uncased-mrpc](https://huggingface.co/Intel/MiniLM-L12-H384-uncased-mrpc).

	The calibration dataloader is the train dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8, so the real sampling size is 104.

	The linear module bert.encoder.layer.6.attention.self.key falls back to fp32 to meet the 1% relative accuracy loss.

	#### Test result

	\| \|INT8\|FP32\|
	\|---\|:---:\|:---:\|
	\| Accuracy (eval-f1) \|0.9039\|0.9097\|
	\| Model size (MB) \|33.5\|127\|

	#### Load with optimum:

	```python
	from optimum.intel import INCModelForSequenceClassification

	model_id = "Intel/MiniLM-L12-H384-uncased-mrpc-int8-static"
	int8_model = INCModelForSequenceClassification.from_pretrained(model_id)
	```

	### ONNX

	This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

	The original fp32 model comes from the fine-tuned model [Intel/MiniLM-L12-H384-uncased-mrpc](https://huggingface.co/Intel/MiniLM-L12-H384-uncased-mrpc).

	The calibration dataloader is the eval dataloader. The calibration sampling size is 100.

	#### Test result

	\| \|INT8\|FP32\|
	\|---\|:---:\|:---:\|
	\| Accuracy (eval-f1) \|0.9013\|0.9097\|
	\| Model size (MB) \|33\|128\|


	#### Load ONNX model:

	```python
	from optimum.onnxruntime import ORTModelForSequenceClassification
	model = ORTModelForSequenceClassification.from_pretrained('Intel/MiniLM-L12-H384-uncased-mrpc-int8-static')
	```