INT8 bge-small-en-v1.5

Post-training static quantization

This is an INT8 PyTorch model generated by Intel® Extension for Transformers and Intel® Neural Compressor.

The original FP32 model comes from BAAI/bge-small-en-v1.5.

The calibration dataloader is CQADupstackAndroidRetrieval task in Massive Text Embedding Benchmark (MTEB) Leaderboard. The default calibration sampling size 10 isn't divisible exactly by batch size 8, so the real sampling size is 16.

Quantization script will be updated later. Before that, you can refer to this document.

Using Intel® Extension for Transformers to load the model

from intel_extension_for_transformers.transformers import OptimizedModel
model_id = "Intel/bge-small-en-v1.5-int8-static"
model = OptimizedModel.from_pretrained(model_id)

Evaluation result

The INT8 PyTorch model is evaluated on STS task in Massive Text Embedding Benchmark (MTEB) Leaderboard.

	INT8	FP32
STS average (10 datasets)	80.85	81.59
Model size (MB)	36.5	134