You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Performance Comparison

Models Task Metric ↑ Value ± Stderr Runtime (m) Size (GB)
FP16 xnli_en acc ↑ 0.4811 ± 0.0100 - -
xstorycloze_en acc ↑ 0.6446 ± 0.0123 13:11 2.13
xwinograd_en acc ↑ 0.7286 ± 0.0092 - -
------------ ---------------- -------- ---- -------- ---- -------- ------------- ---------
GPTQ 4-bit xnli_en acc ↑ 0.4952 ± 0.0100 - -
xstorycloze_en acc ↑ 0.6406 ± 0.0123 15:02 1.13
xwinograd_en acc ↑ 0.7256 ± 0.0093 - -

Performance Metrics Comparison

Metric FP16 GPTQ 4-bit
p50_total_tps 52.813 79.552
p90_total_tps 120.742 119.646
p50_decode_tps 22.992 -31.095
p90_decode_tps 33.487 2.375
p50_ttft_seconds 0.002 0.003
p90_ttft_seconds 0.003 0.011
max_gpu_memory_mb 2232.0 1258.0
p90_gpu_memory_mb 2232.0 1258.0
max_gpu_utilization 51.0 49.0
p90_gpu_utilization 48.0 42.7
Downloads last month
8
Safetensors
Model size
477M params
Tensor type
I32
·
BF16
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for itdainb/bloomz-1b1-w4g128-auto-gptq

Quantized
(6)
this model