neuralmagic
/

Qwen2-72B-Instruct-FP8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen2-72B-Instruct-FP8 / README.md

abhinavnmagic's picture

Update README.md

28fad36 verified 4 months ago

|

No virus

2.42 kB

	---
	tags:
	- fp8
	- vllm
	---

	# Qwen2-72B-Instruct-FP8

	## Model Overview
	Qwen2-72B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

	## Usage and Creation
	Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).

	```python
	from datasets import load_dataset
	from transformers import AutoTokenizer

	from auto_fp8 import AutoFP8ForCausalLM, BaseQuantizeConfig

	pretrained_model_dir = "Qwen/Qwen2-72B-Instruct"
	quantized_model_dir = "Qwen2-72B-Instruct-FP8"

	tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
	tokenizer.pad_token = tokenizer.eos_token

	ds = load_dataset("mgoin/ultrachat_2k", split="train_sft").select(range(512))
	examples = [tokenizer.apply_chat_template(batch["messages"], tokenize=False) for batch in ds]
	examples = tokenizer(examples, padding=True, truncation=True, return_tensors="pt").to("cuda")

	quantize_config = BaseQuantizeConfig(quant_method="fp8", activation_scheme="static")

	model = AutoFP8ForCausalLM.from_pretrained(
	pretrained_model_dir, quantize_config=quantize_config
	)
	model.quantize(examples)
	model.save_quantized(quantized_model_dir)
	```

	## Evaluation

	### Open LLM Leaderboard evaluation scores
	\| \| Qwen2-72B-Instruct \| Qwen2-72B-Instruct-FP8<br>(this model) \|
	\| :------------------: \| :----------------------: \| :------------------------------------------------: \|
	\| arc-c<br>25-shot \| 71.58 \| 72.09 \|
	\| hellaswag<br>10-shot \| 86.94 \| 86.83 \|
	\| mmlu<br>5-shot \| xx.xx \| 84.06 \|
	\| truthfulqa<br>0-shot \| 66.94 \| 66.95 \|
	\| winogrande<br>5-shot \| 82.79 \| 83.18 \|
	\| gsm8k<br>5-shot \| xx.xx \| 88.93 \|
	\| Average<br>Accuracy \| xx.xx \| 80.34 \|
	\| Recovery \| 100% \| xx.xx% \|