Model Performance Comparison
Models | Task | Metric | ↑ | Value | ± | Stderr | Runtime (m) | Size (GB) |
---|---|---|---|---|---|---|---|---|
FP16 | xnli_en | acc | ↑ | 0.4811 | ± | 0.0100 | - | - |
xstorycloze_en | acc | ↑ | 0.6446 | ± | 0.0123 | 13:11 | 2.13 | |
xwinograd_en | acc | ↑ | 0.7286 | ± | 0.0092 | - | - | |
------------ | ---------------- | -------- | ---- | -------- | ---- | -------- | ------------- | --------- |
GPTQ 4-bit | xnli_en | acc | ↑ | 0.4952 | ± | 0.0100 | - | - |
xstorycloze_en | acc | ↑ | 0.6406 | ± | 0.0123 | 15:02 | 1.13 | |
xwinograd_en | acc | ↑ | 0.7256 | ± | 0.0093 | - | - |
Performance Metrics Comparison
Metric | FP16 | GPTQ 4-bit |
---|---|---|
p50_total_tps | 52.813 | 79.552 |
p90_total_tps | 120.742 | 119.646 |
p50_decode_tps | 22.992 | -31.095 |
p90_decode_tps | 33.487 | 2.375 |
p50_ttft_seconds | 0.002 | 0.003 |
p90_ttft_seconds | 0.003 | 0.011 |
max_gpu_memory_mb | 2232.0 | 1258.0 |
p90_gpu_memory_mb | 2232.0 | 1258.0 |
max_gpu_utilization | 51.0 | 49.0 |
p90_gpu_utilization | 48.0 | 42.7 |
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for itdainb/bloomz-1b1-w4g128-auto-gptq
Base model
bigscience/bloomz-1b1