abhinavnmagic commited on
Commit
28fad36
1 Parent(s): 8a9d3f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -4
README.md CHANGED
@@ -4,11 +4,13 @@ tags:
4
  - vllm
5
  ---
6
 
7
- # Qwen2-72B-Instruct-FP8
8
 
9
- Ready to use with `vllm>=0.5.0`.
 
10
 
11
- Quantized with [AutoFP8](https://github.com/neuralmagic/autofp8) using the following script on 8xA100:
 
12
 
13
  ```python
14
  from datasets import load_dataset
@@ -33,4 +35,19 @@ model = AutoFP8ForCausalLM.from_pretrained(
33
  )
34
  model.quantize(examples)
35
  model.save_quantized(quantized_model_dir)
36
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - vllm
5
  ---
6
 
7
+ # Qwen2-72B-Instruct-FP8
8
 
9
+ ## Model Overview
10
+ Qwen2-72B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
11
 
12
+ ## Usage and Creation
13
+ Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
14
 
15
  ```python
16
  from datasets import load_dataset
 
35
  )
36
  model.quantize(examples)
37
  model.save_quantized(quantized_model_dir)
38
+ ```
39
+
40
+ ## Evaluation
41
+
42
+ ### Open LLM Leaderboard evaluation scores
43
+ | | Qwen2-72B-Instruct | Qwen2-72B-Instruct-FP8<br>(this model) |
44
+ | :------------------: | :----------------------: | :------------------------------------------------: |
45
+ | arc-c<br>25-shot | 71.58 | 72.09 |
46
+ | hellaswag<br>10-shot | 86.94 | 86.83 |
47
+ | mmlu<br>5-shot | xx.xx | 84.06 |
48
+ | truthfulqa<br>0-shot | 66.94 | 66.95 |
49
+ | winogrande<br>5-shot | 82.79 | 83.18 |
50
+ | gsm8k<br>5-shot | xx.xx | 88.93 |
51
+ | **Average<br>Accuracy** | **xx.xx** | **80.34** |
52
+ | **Recovery** | **100%** | **xx.xx%** |
53
+