mobiuslabsgmbh
/

Llama-3.1-70b-instruct_4bitgs64_hqq

Text Generation

Model card Files Files and versions Community

mobicham commited on Aug 12, 2024

Commit

aead767

·

verified ·

1 Parent(s): d5d8ca8

add fp16 benchmark

Files changed (1) hide show

README.md +11 -12

README.md CHANGED Viewed

@@ -22,20 +22,19 @@ This is an <a href="https://github.com/mobiusml/hqq/">HQQ</a> all 4-bit (group-s
 | Decoding* - short seq (tokens/sec)|   OOM  | 10.7 (tokens/sec) |
 | Decoding* - long  seq (tokens/sec)|   OOM  | 9.7 (tokens/sec)|
-*: A100 80GB
 ## Performance
-| Models            | HQQ 4-bit/gs-64 (no calib) |
-|:-------------------:|:--------:|
-| ARC (25-shot)      | 70.22 |
-| HellaSwag (10-shot)| 86.39 |
-| MMLU (5-shot)      | 81.04 |
-| TruthfulQA-MC2     | 60.39 |
-| Winogrande (5-shot)| 84.53 |
-| GSM8K (5-shot)     | 89.92 |
-| Average            | 78.75 |
 You can reproduce the results above via `pip install lm-eval==0.4.3`
@@ -58,7 +57,7 @@ from hqq.utils.generation_hf import HFGenerator
 #Load the model
 ###################################################
-model_id = 'mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq' #no calib version
 compute_dtype = torch.bfloat16 #bfloat16 for torchao, float16 for bitblas
 cache_dir = '.'

 | Decoding* - short seq (tokens/sec)|   OOM  | 10.7 (tokens/sec) |
 | Decoding* - long  seq (tokens/sec)|   OOM  | 9.7 (tokens/sec)|
+*: 1xA100 80GB
 ## Performance
+| Models            | fp16 | HQQ 4-bit/gs-64 (no calib) |
+|:-------------------:|:--------:|:--------:|
+| ARC (25-shot)      | 70.31 | 70.22 |
+| HellaSwag (10-shot)| 86.40 | 86.39 |
+| MMLU (5-shot)      | 81.84 | 81.04 |
+| TruthfulQA-MC2     | 59.83 | 60.39 |
+| Winogrande (5-shot)| 84.85 | 84.53 |
+| GSM8K (5-shot)     | 88.25 | 89.92 |
+| Average            | 78.58 | 78.75 |
 You can reproduce the results above via `pip install lm-eval==0.4.3`
 #Load the model
 ###################################################
+model_id = 'mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq'
 compute_dtype = torch.bfloat16 #bfloat16 for torchao, float16 for bitblas
 cache_dir = '.'