You are viewing v1.21.2 version.
A newer version
v1.23.3 is available.
Brevitas is an AMD library for neural network quantization. 🤗 Optimum-AMD integrates with Brevitas so as to make it easier to quantize Transformers models through Brevitas.
This integration also allows to export models quantized through Brevitas to ONNX.
For a refresher on quantization, please have a look at this documentation.
Please refer to ~BrevitasQuantizer and ~BrevitasQuantizationConfig for all available options.
Supported models
Currently, only the following architectures are tested and supported:
- Llama
- OPT
Dynamic quantization
from optimum.amd import BrevitasQuantizationConfig, BrevitasQuantizer
from transformers import AutoTokenizer
# Prepare the quantizer, specifying its configuration and loading the model.
qconfig = BrevitasQuantizationConfig(
is_static=False,
apply_gptq=False,
apply_weight_equalization=False,
activations_equalization=False,
weights_symmetric=True,
activations_symmetric=False,
)
quantizer = BrevitasQuantizer.from_pretrained("facebook/opt-125m")
model = quantizer.quantize(qconfig)
Static quantization
from optimum.amd import BrevitasQuantizationConfig, BrevitasQuantizer
from transformers import AutoTokenizer
# Prepare the quantizer, specifying its configuration and loading the model.
qconfig = BrevitasQuantizationConfig(
is_static=True,
apply_gptq=False,
apply_weight_equalization=True,
activations_equalization=False,
weights_symmetric=True,
activations_symmetric=False,
)
quantizer = BrevitasQuantizer.from_pretrained("facebook/opt-125m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")
# Load the data for calibration and evaluation.
calibration_dataset = get_dataset_for_model(
"facebook/opt-125m",
qconfig=qconfig,
dataset_name="wikitext2",
tokenizer=tokenizer,
nsamples=128,
seqlen=512,
split="train",
)
model = quantizer.quantize(qconfig, calibration_dataset)
Export Brevitas models to ONNX
Brevitas models can be exported to ONNX using Optimum:
import torch
from optimum.amd.brevitas.export import onnx_export_from_quantized_model
# Export to ONNX through optimum.exporters.
onnx_export_from_quantized_model(model, "llm_quantized_onnx")
Complete example
A complete example is available at https://github.com/huggingface/optimum-amd/tree/main/examples/quantization/brevitas.
< > Update on GitHub