itdainb
/

bloomz-1b1-w4g128-auto-gptq

4-bit precision

Model card Files Files and versions Community

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Performance Comparison

Models	Task	Metric	↑	Value	±	Stderr	Runtime (m)	Size (GB)
FP16	xnli_en	acc	↑	0.4811	±	0.0100	-	-
	xstorycloze_en	acc	↑	0.6446	±	0.0123	13:11	2.13
	xwinograd_en	acc	↑	0.7286	±	0.0092	-	-
------------	----------------	--------	----	--------	----	--------	-------------	---------
GPTQ 4-bit	xnli_en	acc	↑	0.4952	±	0.0100	-	-
	xstorycloze_en	acc	↑	0.6406	±	0.0123	15:02	1.13
	xwinograd_en	acc	↑	0.7256	±	0.0093	-	-

Performance Metrics Comparison

Metric	FP16	GPTQ 4-bit
p50_total_tps	52.813	79.552
p90_total_tps	120.742	119.646
p50_decode_tps	22.992	-31.095
p90_decode_tps	33.487	2.375
p50_ttft_seconds	0.002	0.003
p90_ttft_seconds	0.003	0.011
max_gpu_memory_mb	2232.0	1258.0
p90_gpu_memory_mb	2232.0	1258.0
max_gpu_utilization	51.0	49.0
p90_gpu_utilization	48.0	42.7

Downloads last month: 8

Safetensors

Model size

477M params

Tensor type

I32

·

BF16

·

FP16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for itdainb/bloomz-1b1-w4g128-auto-gptq

Base model

bigscience/bloomz-1b1

Quantized

(6)

this model