Lin-K76 commited on
Commit
b38cf0b
1 Parent(s): ffbbf10

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - fp8
4
+ - vllm
5
+ ---
6
+
7
+ # Qwen2-7B-Instruct-FP8
8
+
9
+ ## Model Overview
10
+ Qwen2-7B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
11
+
12
+ ## Usage and Creation
13
+ Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
14
+
15
+ ```python
16
+ from datasets import load_dataset
17
+ from transformers import AutoTokenizer
18
+
19
+ from auto_fp8 import AutoFP8ForCausalLM, BaseQuantizeConfig
20
+
21
+ pretrained_model_dir = "Qwen/Qwen2-7B-Instruct"
22
+ quantized_model_dir = "Qwen2-7B-Instruct-FP8"
23
+
24
+ tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, model_max_length=4096)
25
+ tokenizer.pad_token = tokenizer.eos_token
26
+
27
+ ds = load_dataset("mgoin/ultrachat_2k", split="train_sft").select(range(512))
28
+ examples = [tokenizer.apply_chat_template(batch["messages"], tokenize=False) for batch in ds]
29
+ examples = tokenizer(examples, padding=True, truncation=True, return_tensors="pt").to("cuda")
30
+
31
+ quantize_config = BaseQuantizeConfig(quant_method="fp8", activation_scheme="static")
32
+
33
+ model = AutoFP8ForCausalLM.from_pretrained(
34
+ pretrained_model_dir, quantize_config=quantize_config
35
+ )
36
+ model.quantize(examples)
37
+ model.save_quantized(quantized_model_dir)
38
+ ```
39
+
40
+ ## Evaluation
41
+
42
+ ### Open LLM Leaderboard evaluation scores
43
+ | | Qwen2-7B-Instruct | Qwen2-7B-Instruct-FP8<br>(this model) |
44
+ | :------------------: | :----------------------: | :------------------------------------------------: |
45
+ | arc-c<br>25-shot | 62.37 | 62.03 |
46
+ | hellaswag<br>10-shot | 81.77 | 81.46 |
47
+ | mmlu<br>5-shot | 70.82 | 70.27 |
48
+ | truthfulqa<br>0-shot | 57.36 | 56.34 |
49
+ | winogrande<br>5-shot | 76.16 | 76.72 |
50
+ | gsm8k<br>5-shot | 68.84 | 69.83 |
51
+ | **Average<br>Accuracy** | **69.55** | **69.44** |
52
+ | **Recovery** | **100%** | **99.84%** |
53
+